What are the Findings?
|
|
- Jasmin Ellis
- 6 years ago
- Views:
Transcription
1 What are the Findings? James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison Madison, Wisconsin April 2010 Rawlings (Wisconsin) Stating the findings 1 / 33
2 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
3 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
4 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
5 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
6 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
7 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
8 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33
9 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33
10 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33
11 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33
12 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33
13 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33
14 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33
15 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33
16 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33
17 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33
18 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Rawlings (Wisconsin) Stating the findings 5 / 33
19 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Given the model we can compute the probability of the observation. Let n be the number of heads. The probability of n heads, each with probability θ 0, and N n tails, each with probability 1 θ 0 is ( ) N p(n) = θ n n 0(1 θ 0 ) N n p(n) = B(n, N, θ 0 ) and the ( N n) accounts for the numbers of ways one can obtain n heads and N n tails. Rawlings (Wisconsin) Stating the findings 5 / 33
20 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Given the model we can compute the probability of the observation. Let n be the number of heads. The probability of n heads, each with probability θ 0, and N n tails, each with probability 1 θ 0 is ( ) N p(n) = θ n n 0(1 θ 0 ) N n p(n) = B(n, N, θ 0 ) and the ( N n) accounts for the numbers of ways one can obtain n heads and N n tails. This is the famous binomial distribution. Rawlings (Wisconsin) Stating the findings 5 / 33
21 The likelihood function We define the likelihood of the data L(n; θ) as this function p(n), which is valid for any value of θ ( ) N L(n; θ) = θ n (1 θ) N n n Rawlings (Wisconsin) Stating the findings 6 / 33
22 The likelihood function We define the likelihood of the data L(n; θ) as this function p(n), which is valid for any value of θ ( ) N L(n; θ) = θ n (1 θ) N n n We note that the likelihood depends on the parameter θ and the observation n. Rawlings (Wisconsin) Stating the findings 6 / 33
23 Likelihood function L(n; θ = 0.5) for this experiment L(n; θ = 0.5) Notice that L(n; θ = 0.5) is a probability density in n (sum is one). n Rawlings (Wisconsin) Stating the findings 7 / 33
24 Likelihood function L(n = 527; θ) for this experiment L(n = 527; θ) Notice that L(n = 527; θ) is not a probability density in θ (area not one). θ Rawlings (Wisconsin) Stating the findings 8 / 33
25 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ Rawlings (Wisconsin) Stating the findings 9 / 33
26 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N Rawlings (Wisconsin) Stating the findings 9 / 33
27 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = Rawlings (Wisconsin) Stating the findings 9 / 33
28 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = What could be simpler? Rawlings (Wisconsin) Stating the findings 9 / 33
29 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = What could be simpler? But the question remains: can we conclude the coin is unfair? How? Rawlings (Wisconsin) Stating the findings 9 / 33
30 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. Rawlings (Wisconsin) Stating the findings 10 / 33
31 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. This question is the same one that the social scientists are asking about whether the data support the existence of ESP. Rawlings (Wisconsin) Stating the findings 10 / 33
32 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. This question is the same one that the social scientists are asking about whether the data support the existence of ESP. We could pose the question in the form of a yes/no hypothesis: Is the coin fair? Rawlings (Wisconsin) Stating the findings 10 / 33
33 Hypothesis testing Significance testing in general has been a greatly overworked procedure, and in many cases where significance statements have been made it would have been better to provide an interval within which the value of the parameter would be expected to lie. Box, Hunter, and Hunter (1978, p. 109) Rawlings (Wisconsin) Stating the findings 11 / 33
34 Constructing confidence intervals So let s instead pursue finding the confidence intervals. Rawlings (Wisconsin) Stating the findings 12 / 33
35 Constructing confidence intervals So let s instead pursue finding the confidence intervals. We have an estimator ˆθ = n N Notice that ˆθ is a random variable. Why? (What is not a random variable in this problem?) Rawlings (Wisconsin) Stating the findings 12 / 33
36 Constructing confidence intervals So let s instead pursue finding the confidence intervals. We have an estimator ˆθ = n N Notice that ˆθ is a random variable. Why? (What is not a random variable in this problem?) We know n s probability density, so let s compute ˆθ s probability density ( ) N p n (n) = θ n n 0(1 θ 0 ) N n ( ) N pˆθ(ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) Rawlings (Wisconsin) Stating the findings 12 / 33
37 Defining the confidence interval Define a new random variable z = ˆθ θ 0. Rawlings (Wisconsin) Stating the findings 13 / 33
38 Defining the confidence interval Define a new random variable z = ˆθ θ 0. We would like to find a positive, scalar a > 0 such that Pr( a z a) = α in which 0 < α < 1 is the confidence level. Rawlings (Wisconsin) Stating the findings 13 / 33
39 Defining the confidence interval Define a new random variable z = ˆθ θ 0. We would like to find a positive, scalar a > 0 such that Pr( a z a) = α in which 0 < α < 1 is the confidence level. The definition implies that there is α-level probability that the true parameter θ 0 lies in the (symmetric) confidence interval [ˆθ a, ˆθ + a], or Pr(ˆθ a θ 0 ˆθ + a) = α Rawlings (Wisconsin) Stating the findings 13 / 33
40 What s the rub? The problem with this approach is that the density of z depends on θ 0 ( ) N pˆθ (ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) ( ) N p z (z) = θ N(z+θ 0) 0 (1 θ 0 ) N(1 z θ 0) N(z + θ 0 ) Rawlings (Wisconsin) Stating the findings 14 / 33
41 What s the rub? The problem with this approach is that the density of z depends on θ 0 ( ) N pˆθ (ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) ( ) N p z (z) = θ N(z+θ 0) 0 (1 θ 0 ) N(1 z θ 0) N(z + θ 0 ) I cannot find a > 0 such that unless I know θ 0! Pr( a z a) = α Rawlings (Wisconsin) Stating the findings 14 / 33
42 The effect of θ 0 on the confidence interval p z θ 0 = θ 0 = z Rawlings (Wisconsin) Stating the findings 15 / 33
43 The effect of θ 0 on the confidence interval θ 0 = 0.5 p z z Pr(ˆθ θ 0 ˆθ ) = 0.95 Rawlings (Wisconsin) Stating the findings 16 / 33
44 The effect of θ 0 on the confidence interval p z θ 0 = z Pr(ˆθ θ 0 ˆθ ) = 0.95 Rawlings (Wisconsin) Stating the findings 17 / 33
45 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Rawlings (Wisconsin) Stating the findings 18 / 33
46 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. Rawlings (Wisconsin) Stating the findings 18 / 33
47 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Rawlings (Wisconsin) Stating the findings 18 / 33
48 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Note that both θ and n are now random variables, and they are not independent. Rawlings (Wisconsin) Stating the findings 18 / 33
49 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Note that both θ and n are now random variables, and they are not independent. There is no true parameter θ 0 in this problem. Rawlings (Wisconsin) Stating the findings 18 / 33
50 Conditional density Consider two random variables A, B. The conditional density of A given B denoted p A B (a b) is defined as p A B (a b) = p A,B(a, b) p B (b) p B (b) 0 Rawlings (Wisconsin) Stating the findings 19 / 33
51 Conditional to Bayes p A B (a b) = p A,B(a, b) p B (b) = p B,A(b, a) p B (b) = p B A(b a)p A (a) p B (b) p A B (a b) = p B A(b a)p A (a) pb A (b a)p A (a)da According to Papoulis (1984, p.30), main idea by Thomas Bayes in 1763, but final form given by Laplace several years later. Rawlings (Wisconsin) Stating the findings 20 / 33
52 The densities of (θ, n) The joint density {( N ) p θ,n (θ, n) = n θ n (1 θ) N n, θ [0, 1], n [0, N] 0 θ / [0, 1] or n / [0, N] Rawlings (Wisconsin) Stating the findings 21 / 33
53 The densities of (θ, n) The joint density {( N ) p θ,n (θ, n) = n θ n (1 θ) N n, θ [0, 1], n [0, N] 0 θ / [0, 1] or n / [0, N] The marginal densities p θ (θ) = p n (n) = n [0,N] 1 0 p(θ, n) = 1 p(θ, n)dθ = 1 N + 1 Rawlings (Wisconsin) Stating the findings 21 / 33
54 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) Rawlings (Wisconsin) Stating the findings 22 / 33
55 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) The Bayesian posterior is the famous beta distribution Rawlings (Wisconsin) Stating the findings 22 / 33
56 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) The Bayesian posterior is the famous beta distribution Maximizing the posterior over θ gives the Bayesian estimate θ = max p(θ n) θ θ = n N which agrees with the maximum likelihood estimate! Rawlings (Wisconsin) Stating the findings 22 / 33
57 Conditional density p(θ n = 527) for this experiment p(θ n = 527) Notice that (unlike L(n = 527; θ)), p(θ n = 527) is a probability density in θ (area is one). θ Rawlings (Wisconsin) Stating the findings 23 / 33
58 Confidence intervals from Bayesian posterior Computing confidence intervals is unambiguous. Find [a, b] such that b a p(θ n) = α and there is α-level probability that random variable θ [a, b] after observation n. Rawlings (Wisconsin) Stating the findings 24 / 33
59 Closer look at the conditional density α = p(θ n = 527) θ Rawlings (Wisconsin) Stating the findings 25 / 33
60 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Rawlings (Wisconsin) Stating the findings 26 / 33
61 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Therefore I conclude the coin is unfair with 90% confidence level. Rawlings (Wisconsin) Stating the findings 26 / 33
62 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Therefore I conclude the coin is unfair with 90% confidence level. But α 91.3% confidence level does include θ = 1/2. I cannot conclude the coin is unfair with greater than 91.3% confidence. Rawlings (Wisconsin) Stating the findings 26 / 33
63 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. Rawlings (Wisconsin) Stating the findings 27 / 33
64 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. What? That better not be classical analysis. For the binomial, it is true that Pr(n 527) = 1 F (n = 526, θ = 0.5) = Rawlings (Wisconsin) Stating the findings 27 / 33
65 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. What? That better not be classical analysis. For the binomial, it is true that Pr(n 527) = 1 F (n = 526, θ = 0.5) = But so what? We don t have n 527, we have n = 527. Rawlings (Wisconsin) Stating the findings 27 / 33
66 Back to the NY Times Yet many statisticians do not buy it.... It is thus more accurate, these experts say, to calculate the probability of getting that one number 527 if the coin is weighted, and compare it with the probability of getting the same number if the coin is fair. Statisticians can show that this ratio cannot be higher than about 4 to 1... Rawlings (Wisconsin) Stating the findings 28 / 33
67 Back to the NY Times Yet many statisticians do not buy it.... It is thus more accurate, these experts say, to calculate the probability of getting that one number 527 if the coin is weighted, and compare it with the probability of getting the same number if the coin is fair. Statisticians can show that this ratio cannot be higher than about 4 to 1... Again, so what? B(527, 1000, θ) r = max θ B(527, 1000, 0.5) B(527, 1000, 0.527) = B(527, 1000, 0.5) = 4.30 Rawlings (Wisconsin) Stating the findings 28 / 33
68 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Rawlings (Wisconsin) Stating the findings 29 / 33
69 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. Rawlings (Wisconsin) Stating the findings 29 / 33
70 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? Rawlings (Wisconsin) Stating the findings 29 / 33
71 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? The quantity B(527, 1000, 0.5) B(527, 1000, 0.527) = is not the probability of a random event. Rawlings (Wisconsin) Stating the findings 29 / 33
72 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? The quantity B(527, 1000, 0.5) B(527, 1000, 0.527) = is not the probability of a random event. Where is this statement about 4-to-1 odds coming from? Rawlings (Wisconsin) Stating the findings 29 / 33
73 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
74 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
75 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
76 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
77 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
78 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33
79 Further reading Thomas Bayes. An essay towards solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc., 53: , Reprinted in Biometrika, 35: , George E. P. Box, William G. Hunter, and J. Stuart Hunter. Statistics for Experimenters. John Wiley & Sons, New York, Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, Inc., second edition, Rawlings (Wisconsin) Stating the findings 31 / 33
80 Questions or comments? Rawlings (Wisconsin) Stating the findings 32 / 33
81 Study question Consider the classic maximum likelihood problem for the linear model y = X θ 0 + e in which vector y is measured, parameter θ 0 is unknown and to be estimated, e is normally distributed measurement error, and X is a constant matrix. When e N(0, σi ) and we don t know σ, we obtain the following distribution for the maximum likelihood estimate ˆθ N(θ 0, σ 2 (X T X ) 1 ) This density contains two unknown parameters, θ 0 and σ. But we can still obtain confidence intervals in this case. What s the difference? Rawlings (Wisconsin) Stating the findings 33 / 33
Computational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationA primer on Bayesian statistics, with an application to mortality rate estimation
A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington Outline Subjective probability Practical aspects Application to mortality rate estimation
More informationBayesian Estimation An Informal Introduction
Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing
ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.
More informationPHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric
PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure:
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationCompute f(x θ)f(θ) dθ
Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/
More informationECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu
ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection
More informationMachine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples
Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds
More informationSTA 247 Solutions to Assignment #1
STA 247 Solutions to Assignment #1 Question 1: Suppose you throw three six-sided dice (coloured red, green, and blue) repeatedly, until the three dice all show different numbers. Assuming that these dice
More informationP (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).
Lectures 7-8 jacques@ucsdedu 41 Conditional Probability Let (Ω, F, P ) be a probability space Suppose that we have prior information which leads us to conclude that an event A F occurs Based on this information,
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationThe devil is in the denominator
Chapter 6 The devil is in the denominator 6. Too many coin flips Suppose we flip two coins. Each coin i is either fair (P r(h) = θ i = 0.5) or biased towards heads (P r(h) = θ i = 0.9) however, we cannot
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationStatistical Methods in Particle Physics. Lecture 2
Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationConjugate Priors: Beta and Normal Spring 2018
Conjugate Priors: Beta and Normal 18.05 Spring 018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationChapter Three. Hypothesis Testing
3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationConjugate Priors: Beta and Normal Spring 2018
Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one toss. Question: Find
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationIntroduction to Statistical Methods for High Energy Physics
Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationMaximum-Likelihood Estimation: Basic Ideas
Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationFrequentist Statistics and Hypothesis Testing Spring
Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics
More informationBayesian hypothesis testing
Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2018 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing February 21, 2018 1 / 25 Outline Scientific method Statistical
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationAdvanced Probabilistic Modeling in R Day 1
Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,
More informationComparison of Bayesian and Frequentist Inference
Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationSAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING
SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC
More informationBayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014
Bayes Formula MATH 07: Finite Mathematics University of Louisville March 26, 204 Test Accuracy Conditional reversal 2 / 5 A motivating question A rare disease occurs in out of every 0,000 people. A test
More informationInference Control and Driving of Natural Systems
Inference Control and Driving of Natural Systems MSci/MSc/MRes nick.jones@imperial.ac.uk The fields at play We will be drawing on ideas from Bayesian Cognitive Science (psychology and neuroscience), Biological
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationCS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev
CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationBasics of Statistical Estimation
Basics of Statistical Estimation Doug Downey, Nortwestern EECS 395/495, Spring 206 (several illustrations from P. Domingos, University of Wasington CSE Bayes Rule P(A B = P(B A P(A / P(B Example: P(symptom
More informationProbability Theory Review
Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability
More informationHypothesis Testing. File: /General/MLAB-Text/Papers/hyptest.tex
File: /General/MLAB-Text/Papers/hyptest.tex Hypothesis Testing Gary D. Knott, Ph.D. Civilized Software, Inc. 12109 Heritage Park Circle Silver Spring, MD 20906 USA Tel. (301) 962-3711 Email: csi@civilized.com
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationMATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010
MATH 9B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 00 This handout is meant to provide a collection of exercises that use the material from the probability and statistics portion of the course The
More informationCENTRAL LIMIT THEOREM (CLT)
CENTRAL LIMIT THEOREM (CLT) A sampling distribution is the probability distribution of the sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive
More informationIntro to Bayesian Methods
Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationDiscrete Probability Distribution Tables
Section 5 A : Discrete Probability Distributions Introduction Discrete Probability Distribution ables A probability distribution table is like the relative frequency tables that we constructed in chapter.
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationIntroduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department
Introduction to Bayesian Statistics James Swain University of Alabama in Huntsville ISEEM Department Author Introduction James J. Swain is Professor of Industrial and Systems Engineering Management at
More informationData Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More information