What are the Findings?

Size: px
Start display at page:

Download "What are the Findings?"

Transcription

1 What are the Findings? James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison Madison, Wisconsin April 2010 Rawlings (Wisconsin) Stating the findings 1 / 33

2 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

3 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

4 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

5 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

6 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

7 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

8 Why look at this problem? Hypothesis testing (is it a fair coin?) Confidence intervals (assign probability to estimate) Quantifying information gained from measurement Conditional probability Bayesian estimation Intuition requires experience. This problem provides experience It s a fun problem! Rawlings (Wisconsin) Stating the findings 2 / 33

9 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33

10 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33

11 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33

12 Is it a fair coin? I am given a coin I want to know if it is a fair coin So I flip it 1000 times and observe 527 heads What can I conclude? Rawlings (Wisconsin) Stating the findings 3 / 33

13 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33

14 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33

15 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33

16 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33

17 Standard model of a coin The coin is characterized by a parameter value, θ The probability of flipping a heads is θ and tails is 1 θ The fair coin has parameter value θ = 1/2 The parameter is a feature of the coin and does not change with time All flips of the coin are independent events, i.e. they are uninfluenced by the outcomes of other flips. Rawlings (Wisconsin) Stating the findings 4 / 33

18 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Rawlings (Wisconsin) Stating the findings 5 / 33

19 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Given the model we can compute the probability of the observation. Let n be the number of heads. The probability of n heads, each with probability θ 0, and N n tails, each with probability 1 θ 0 is ( ) N p(n) = θ n n 0(1 θ 0 ) N n p(n) = B(n, N, θ 0 ) and the ( N n) accounts for the numbers of ways one can obtain n heads and N n tails. Rawlings (Wisconsin) Stating the findings 5 / 33

20 How do we estimate θ from the experiment? Say the coin has fixed, but unknown, parameter θ 0. Given the model we can compute the probability of the observation. Let n be the number of heads. The probability of n heads, each with probability θ 0, and N n tails, each with probability 1 θ 0 is ( ) N p(n) = θ n n 0(1 θ 0 ) N n p(n) = B(n, N, θ 0 ) and the ( N n) accounts for the numbers of ways one can obtain n heads and N n tails. This is the famous binomial distribution. Rawlings (Wisconsin) Stating the findings 5 / 33

21 The likelihood function We define the likelihood of the data L(n; θ) as this function p(n), which is valid for any value of θ ( ) N L(n; θ) = θ n (1 θ) N n n Rawlings (Wisconsin) Stating the findings 6 / 33

22 The likelihood function We define the likelihood of the data L(n; θ) as this function p(n), which is valid for any value of θ ( ) N L(n; θ) = θ n (1 θ) N n n We note that the likelihood depends on the parameter θ and the observation n. Rawlings (Wisconsin) Stating the findings 6 / 33

23 Likelihood function L(n; θ = 0.5) for this experiment L(n; θ = 0.5) Notice that L(n; θ = 0.5) is a probability density in n (sum is one). n Rawlings (Wisconsin) Stating the findings 7 / 33

24 Likelihood function L(n = 527; θ) for this experiment L(n = 527; θ) Notice that L(n = 527; θ) is not a probability density in θ (area not one). θ Rawlings (Wisconsin) Stating the findings 8 / 33

25 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ Rawlings (Wisconsin) Stating the findings 9 / 33

26 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N Rawlings (Wisconsin) Stating the findings 9 / 33

27 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = Rawlings (Wisconsin) Stating the findings 9 / 33

28 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = What could be simpler? Rawlings (Wisconsin) Stating the findings 9 / 33

29 Maximum likelihood estimation A sensible parameter estimation then is to find the value of θ that maximizes the likelihood of the observation n ˆθ(n) = max L(n; θ) θ For this problem, take L s derivative, set it to zero and find ˆθ = n N After observing 527 heads out of 1000 flips, we conclude ˆθ = What could be simpler? But the question remains: can we conclude the coin is unfair? How? Rawlings (Wisconsin) Stating the findings 9 / 33

30 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. Rawlings (Wisconsin) Stating the findings 10 / 33

31 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. This question is the same one that the social scientists are asking about whether the data support the existence of ESP. Rawlings (Wisconsin) Stating the findings 10 / 33

32 Now the controversy We want to draw a statistically valid conclusion about whether the coin is fair. This question is the same one that the social scientists are asking about whether the data support the existence of ESP. We could pose the question in the form of a yes/no hypothesis: Is the coin fair? Rawlings (Wisconsin) Stating the findings 10 / 33

33 Hypothesis testing Significance testing in general has been a greatly overworked procedure, and in many cases where significance statements have been made it would have been better to provide an interval within which the value of the parameter would be expected to lie. Box, Hunter, and Hunter (1978, p. 109) Rawlings (Wisconsin) Stating the findings 11 / 33

34 Constructing confidence intervals So let s instead pursue finding the confidence intervals. Rawlings (Wisconsin) Stating the findings 12 / 33

35 Constructing confidence intervals So let s instead pursue finding the confidence intervals. We have an estimator ˆθ = n N Notice that ˆθ is a random variable. Why? (What is not a random variable in this problem?) Rawlings (Wisconsin) Stating the findings 12 / 33

36 Constructing confidence intervals So let s instead pursue finding the confidence intervals. We have an estimator ˆθ = n N Notice that ˆθ is a random variable. Why? (What is not a random variable in this problem?) We know n s probability density, so let s compute ˆθ s probability density ( ) N p n (n) = θ n n 0(1 θ 0 ) N n ( ) N pˆθ(ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) Rawlings (Wisconsin) Stating the findings 12 / 33

37 Defining the confidence interval Define a new random variable z = ˆθ θ 0. Rawlings (Wisconsin) Stating the findings 13 / 33

38 Defining the confidence interval Define a new random variable z = ˆθ θ 0. We would like to find a positive, scalar a > 0 such that Pr( a z a) = α in which 0 < α < 1 is the confidence level. Rawlings (Wisconsin) Stating the findings 13 / 33

39 Defining the confidence interval Define a new random variable z = ˆθ θ 0. We would like to find a positive, scalar a > 0 such that Pr( a z a) = α in which 0 < α < 1 is the confidence level. The definition implies that there is α-level probability that the true parameter θ 0 lies in the (symmetric) confidence interval [ˆθ a, ˆθ + a], or Pr(ˆθ a θ 0 ˆθ + a) = α Rawlings (Wisconsin) Stating the findings 13 / 33

40 What s the rub? The problem with this approach is that the density of z depends on θ 0 ( ) N pˆθ (ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) ( ) N p z (z) = θ N(z+θ 0) 0 (1 θ 0 ) N(1 z θ 0) N(z + θ 0 ) Rawlings (Wisconsin) Stating the findings 14 / 33

41 What s the rub? The problem with this approach is that the density of z depends on θ 0 ( ) N pˆθ (ˆθ) = θ N ˆθ N ˆθ 0 (1 θ 0 ) N(1 ˆθ) ( ) N p z (z) = θ N(z+θ 0) 0 (1 θ 0 ) N(1 z θ 0) N(z + θ 0 ) I cannot find a > 0 such that unless I know θ 0! Pr( a z a) = α Rawlings (Wisconsin) Stating the findings 14 / 33

42 The effect of θ 0 on the confidence interval p z θ 0 = θ 0 = z Rawlings (Wisconsin) Stating the findings 15 / 33

43 The effect of θ 0 on the confidence interval θ 0 = 0.5 p z z Pr(ˆθ θ 0 ˆθ ) = 0.95 Rawlings (Wisconsin) Stating the findings 16 / 33

44 The effect of θ 0 on the confidence interval p z θ 0 = z Pr(ˆθ θ 0 ˆθ ) = 0.95 Rawlings (Wisconsin) Stating the findings 17 / 33

45 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Rawlings (Wisconsin) Stating the findings 18 / 33

46 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. Rawlings (Wisconsin) Stating the findings 18 / 33

47 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Rawlings (Wisconsin) Stating the findings 18 / 33

48 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Note that both θ and n are now random variables, and they are not independent. Rawlings (Wisconsin) Stating the findings 18 / 33

49 OK, so now what do we do? Change the experiment Imagine instead that I draw a random θ from a uniform distribution on the interval [0, 1]. Then I collect a sample of 1000 coin flips with this coin having value θ. What can I conclude from this experiment? Note that both θ and n are now random variables, and they are not independent. There is no true parameter θ 0 in this problem. Rawlings (Wisconsin) Stating the findings 18 / 33

50 Conditional density Consider two random variables A, B. The conditional density of A given B denoted p A B (a b) is defined as p A B (a b) = p A,B(a, b) p B (b) p B (b) 0 Rawlings (Wisconsin) Stating the findings 19 / 33

51 Conditional to Bayes p A B (a b) = p A,B(a, b) p B (b) = p B,A(b, a) p B (b) = p B A(b a)p A (a) p B (b) p A B (a b) = p B A(b a)p A (a) pb A (b a)p A (a)da According to Papoulis (1984, p.30), main idea by Thomas Bayes in 1763, but final form given by Laplace several years later. Rawlings (Wisconsin) Stating the findings 20 / 33

52 The densities of (θ, n) The joint density {( N ) p θ,n (θ, n) = n θ n (1 θ) N n, θ [0, 1], n [0, N] 0 θ / [0, 1] or n / [0, N] Rawlings (Wisconsin) Stating the findings 21 / 33

53 The densities of (θ, n) The joint density {( N ) p θ,n (θ, n) = n θ n (1 θ) N n, θ [0, 1], n [0, N] 0 θ / [0, 1] or n / [0, N] The marginal densities p θ (θ) = p n (n) = n [0,N] 1 0 p(θ, n) = 1 p(θ, n)dθ = 1 N + 1 Rawlings (Wisconsin) Stating the findings 21 / 33

54 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) Rawlings (Wisconsin) Stating the findings 22 / 33

55 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) The Bayesian posterior is the famous beta distribution Rawlings (Wisconsin) Stating the findings 22 / 33

56 Bayesian posterior Computing the conditional density gives ( ) N p(θ n) = (N + 1) θ n (1 θ) N n n p(θ n) = (N + 1)B(n, N, θ) p(θ n) = β(θ, n + 1, N n + 1) The Bayesian posterior is the famous beta distribution Maximizing the posterior over θ gives the Bayesian estimate θ = max p(θ n) θ θ = n N which agrees with the maximum likelihood estimate! Rawlings (Wisconsin) Stating the findings 22 / 33

57 Conditional density p(θ n = 527) for this experiment p(θ n = 527) Notice that (unlike L(n = 527; θ)), p(θ n = 527) is a probability density in θ (area is one). θ Rawlings (Wisconsin) Stating the findings 23 / 33

58 Confidence intervals from Bayesian posterior Computing confidence intervals is unambiguous. Find [a, b] such that b a p(θ n) = α and there is α-level probability that random variable θ [a, b] after observation n. Rawlings (Wisconsin) Stating the findings 24 / 33

59 Closer look at the conditional density α = p(θ n = 527) θ Rawlings (Wisconsin) Stating the findings 25 / 33

60 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Rawlings (Wisconsin) Stating the findings 26 / 33

61 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Therefore I conclude the coin is unfair with 90% confidence level. Rawlings (Wisconsin) Stating the findings 26 / 33

62 So is the coin fair? The Bayesian conclusion is that the 90% symmetric confidence interval centered at θ = does not contain θ = 1/2. Therefore I conclude the coin is unfair with 90% confidence level. But α 91.3% confidence level does include θ = 1/2. I cannot conclude the coin is unfair with greater than 91.3% confidence. Rawlings (Wisconsin) Stating the findings 26 / 33

63 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. Rawlings (Wisconsin) Stating the findings 27 / 33

64 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. What? That better not be classical analysis. For the binomial, it is true that Pr(n 527) = 1 F (n = 526, θ = 0.5) = Rawlings (Wisconsin) Stating the findings 27 / 33

65 Back to the NY Times Is this significant evidence that the coin is weighted? Classical analysis says yes. With a fair coin, the chances of getting 527 or more heads in 1,000 flips is less than 1 in 20, or 5 percent. To put it another way: the experiment finds evidence of a weighted coin with 95 percent confidence. What? That better not be classical analysis. For the binomial, it is true that Pr(n 527) = 1 F (n = 526, θ = 0.5) = But so what? We don t have n 527, we have n = 527. Rawlings (Wisconsin) Stating the findings 27 / 33

66 Back to the NY Times Yet many statisticians do not buy it.... It is thus more accurate, these experts say, to calculate the probability of getting that one number 527 if the coin is weighted, and compare it with the probability of getting the same number if the coin is fair. Statisticians can show that this ratio cannot be higher than about 4 to 1... Rawlings (Wisconsin) Stating the findings 28 / 33

67 Back to the NY Times Yet many statisticians do not buy it.... It is thus more accurate, these experts say, to calculate the probability of getting that one number 527 if the coin is weighted, and compare it with the probability of getting the same number if the coin is fair. Statisticians can show that this ratio cannot be higher than about 4 to 1... Again, so what? B(527, 1000, θ) r = max θ B(527, 1000, 0.5) B(527, 1000, 0.527) = B(527, 1000, 0.5) = 4.30 Rawlings (Wisconsin) Stating the findings 28 / 33

68 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Rawlings (Wisconsin) Stating the findings 29 / 33

69 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. Rawlings (Wisconsin) Stating the findings 29 / 33

70 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? Rawlings (Wisconsin) Stating the findings 29 / 33

71 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? The quantity B(527, 1000, 0.5) B(527, 1000, 0.527) = is not the probability of a random event. Rawlings (Wisconsin) Stating the findings 29 / 33

72 Back to the NY Times The point here, said Dr. Rouder is that 4-to-1 odds just aren t that convincing; it s not strong evidence. And yet classical significance testing has been saying for at least 80 years that this is strong evidence, Dr. Speckman said in an . Four-to-one odds means two possible random outcomes have probabilities 0.2 and 0.8. What in this problem has probability 0.2? The quantity B(527, 1000, 0.5) B(527, 1000, 0.527) = is not the probability of a random event. Where is this statement about 4-to-1 odds coming from? Rawlings (Wisconsin) Stating the findings 29 / 33

73 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

74 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

75 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

76 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

77 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

78 What did we learn? Maximizing likelihood of observation gives a sensible estimator It may be problematic to compute confidence intervals when using maximum likelihood Conditional probability quantifies information Bayes theorem is (almost) a definition of conditional probability If the parameter is also considered random, it s easy to construct the posterior distribution and confidence intervals. The controversy here seems to be that some consider θ 0 a fixed parameter, but cannot then find confidence intervals that are independent of this unknown parameter. Rawlings (Wisconsin) Stating the findings 30 / 33

79 Further reading Thomas Bayes. An essay towards solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc., 53: , Reprinted in Biometrika, 35: , George E. P. Box, William G. Hunter, and J. Stuart Hunter. Statistics for Experimenters. John Wiley & Sons, New York, Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, Inc., second edition, Rawlings (Wisconsin) Stating the findings 31 / 33

80 Questions or comments? Rawlings (Wisconsin) Stating the findings 32 / 33

81 Study question Consider the classic maximum likelihood problem for the linear model y = X θ 0 + e in which vector y is measured, parameter θ 0 is unknown and to be estimated, e is normally distributed measurement error, and X is a constant matrix. When e N(0, σi ) and we don t know σ, we obtain the following distribution for the maximum likelihood estimate ˆθ N(θ 0, σ 2 (X T X ) 1 ) This density contains two unknown parameters, θ 0 and σ. But we can still obtain confidence intervals in this case. What s the difference? Rawlings (Wisconsin) Stating the findings 33 / 33

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

A primer on Bayesian statistics, with an application to mortality rate estimation

A primer on Bayesian statistics, with an application to mortality rate estimation A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington Outline Subjective probability Practical aspects Application to mortality rate estimation

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure:

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

STA 247 Solutions to Assignment #1

STA 247 Solutions to Assignment #1 STA 247 Solutions to Assignment #1 Question 1: Suppose you throw three six-sided dice (coloured red, green, and blue) repeatedly, until the three dice all show different numbers. Assuming that these dice

More information

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B). Lectures 7-8 jacques@ucsdedu 41 Conditional Probability Let (Ω, F, P ) be a probability space Suppose that we have prior information which leads us to conclude that an event A F occurs Based on this information,

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

The devil is in the denominator

The devil is in the denominator Chapter 6 The devil is in the denominator 6. Too many coin flips Suppose we flip two coins. Each coin i is either fair (P r(h) = θ i = 0.5) or biased towards heads (P r(h) = θ i = 0.9) however, we cannot

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Quantitative Understanding in Biology 1.7 Bayesian Methods

Quantitative Understanding in Biology 1.7 Bayesian Methods Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Statistical Methods in Particle Physics. Lecture 2

Statistical Methods in Particle Physics. Lecture 2 Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring 2018 Conjugate Priors: Beta and Normal 18.05 Spring 018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring 2018 Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one toss. Question: Find

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to Statistical Methods for High Energy Physics

Introduction to Statistical Methods for High Energy Physics Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Frequentist Statistics and Hypothesis Testing Spring

Frequentist Statistics and Hypothesis Testing Spring Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

Bayesian hypothesis testing

Bayesian hypothesis testing Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2018 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing February 21, 2018 1 / 25 Outline Scientific method Statistical

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

Advanced Probabilistic Modeling in R Day 1

Advanced Probabilistic Modeling in R Day 1 Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,

More information

Comparison of Bayesian and Frequentist Inference

Comparison of Bayesian and Frequentist Inference Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC

More information

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014 Bayes Formula MATH 07: Finite Mathematics University of Louisville March 26, 204 Test Accuracy Conditional reversal 2 / 5 A motivating question A rare disease occurs in out of every 0,000 people. A test

More information

Inference Control and Driving of Natural Systems

Inference Control and Driving of Natural Systems Inference Control and Driving of Natural Systems MSci/MSc/MRes nick.jones@imperial.ac.uk The fields at play We will be drawing on ideas from Bayesian Cognitive Science (psychology and neuroscience), Biological

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Basics of Statistical Estimation

Basics of Statistical Estimation Basics of Statistical Estimation Doug Downey, Nortwestern EECS 395/495, Spring 206 (several illustrations from P. Domingos, University of Wasington CSE Bayes Rule P(A B = P(B A P(A / P(B Example: P(symptom

More information

Probability Theory Review

Probability Theory Review Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability

More information

Hypothesis Testing. File: /General/MLAB-Text/Papers/hyptest.tex

Hypothesis Testing. File: /General/MLAB-Text/Papers/hyptest.tex File: /General/MLAB-Text/Papers/hyptest.tex Hypothesis Testing Gary D. Knott, Ph.D. Civilized Software, Inc. 12109 Heritage Park Circle Silver Spring, MD 20906 USA Tel. (301) 962-3711 Email: csi@civilized.com

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010 MATH 9B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 00 This handout is meant to provide a collection of exercises that use the material from the probability and statistics portion of the course The

More information

CENTRAL LIMIT THEOREM (CLT)

CENTRAL LIMIT THEOREM (CLT) CENTRAL LIMIT THEOREM (CLT) A sampling distribution is the probability distribution of the sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive

More information

Intro to Bayesian Methods

Intro to Bayesian Methods Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Discrete Probability Distribution Tables

Discrete Probability Distribution Tables Section 5 A : Discrete Probability Distributions Introduction Discrete Probability Distribution ables A probability distribution table is like the relative frequency tables that we constructed in chapter.

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department Introduction to Bayesian Statistics James Swain University of Alabama in Huntsville ISEEM Department Author Introduction James J. Swain is Professor of Industrial and Systems Engineering Management at

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information