Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
|
|
- Toby McKinney
- 5 years ago
- Views:
Transcription
1 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20,
2 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher) focus on null hypothesis testing. Neyman-Pearson tests focus on null and alternative hypothesis testing. Both involve an appeal to long run trials. They adopt an ex ante position (justify a procedure by the number of times it is successful if used repeatedly). 2
3 2 Pure Significance Tests Focuses exclusively on the null hypothesis Let ( 1 ) be observations from a sample. Let ( 1 ) be a test statistic. If (1) We know the distribution of ( ) under 0,and (2) The larger the value of ( ), the more the evidence against 0, then =Pr( : 0 ). This is evidence against the null hypothesis. 3
4 Observethatthe valueisauniform(0 1) variable. For random variable with density (absolutely continuous with Lebesgue measure) = () is uniform for any given that is continuous. (Prove this for yourself. It is automatic from the definition.) value probability that would occur given that 0 is a true state of aairs. test or test for a regression coecient is an example; The higher the test statistic, the more likely we reject. Ignores any evidence on alternatives. 4
5 R.A. Fisher liked this feature because it did not involve speculation about other possibilities than the one realized. values make an absolute statement about a model. 5
6 Questions I consider: (1) How to conduct a best test? Compare alternative tests. Any monotonic transformation of the statistic produces the same value. (2) Pure significance tests depend on the sampling rule used. This is not necessarily bad. (3) How to pool across studies (or across coecients)? 6
7 2.1 Bayesian vs. Frequentist or Classical Approach ISSUES: (1) In what sense and how well do significance levels or values summarize evidence in favor of or against hypotheses? (2) Do we always reject a null in a big enough sample? Meaningful hypothesis testing Bayesian or Classical requires that significance levels decrease with sample size; (3) Two views: =0tests something meaningful vs. =0 only an approximation, shouldn t be taken too seriously. 7
8 (4) How to test non-nested models (Cox-non-nested test and extensions) Classical form: Set 0 = 1 = 0 : = : = Test = = Test vs. 0 (this is asymmetric) and is not hypothesis of interest. Theil 2 test (Its asymmetric in stating the null) See Leamer Chapter 3. 8
9 (5) How to quantify evidence about model? (How to incorporate prior restrictions?) What is strength of evidence? (6) How to account for model uncertainty: fishing, etc. First consider the basic Neyman-Pearson structure- then switch over to a Bayesian paradigm. 9
10 Useful to separate out: (1) Decision problems From (2) Acts of data description. This is a topic of great controversy in statistics. 10
11 Question: In what sense does increasing sample size always lead to rejection of an hypothesis? If null not exactly true, we get rejections (The power of test 1 for fixed sig. level as sample size increases) 11
12 Example to refresh your memory about Neyman-Pearson Theory. Take one-tail normal test about a mean: What is the test? Assume 2 is known. 0 : 0 2 : 2 12
13 For any we get Pr à 0 p 2 0 p 2! = () (Exploit symmetry of standard normal around the origin). For a fixed, wecansolvefor (). () = 0 1 () 13
14 Now what is the probability of rejecting the hypothesis under alternatives? (The power of a test). Let be the alternative value of.fix to have a certain size. (Use the previous calculations) Pr à p 2 p 2 =Pr! ³ 0 ³ 1 (). We are evaluating the probability of rejection when we allow to vary. 14
15 Thus = Pr ³ 0 ³ 1 () = when 0 = If 0, this probability goes to one. This is a consistent test. 15
16 Now, suppose we seek to test 0 : 0.Use fixed If 0 is true: ³ 0 0 ³ The distribution becomes more and more concentrated at 0. We reject the null unless 0 =. 16
17 Power of the test for T elements of Data Prob[mean(X ) > c ( )] 1 0 Prob[mean(X ) > c ( )] 2 0 Prob[mean(X ) > c ( )] 10 0 Prob[mean(X ) > c ( )] Prob[mean(X ) > c ( )] True value ( = 0.05, = 1, =0) A 0 17
18 Parenthetical Note: Observe that if we measure with the slightest error and the errors do not have mean zero, we always reject 0 for big enough. 18
19 Design of Sample size Suppose that we fix the power =. Pick (). What sample size produces the desired power? We postulate the alternative =
20 Pr ³ 0 ³ 1 () = Ã 1 ()+ 0! = 1 () = 1 ()+ 0 [ 1 () 1 ()] = 20
21 Minimum needed to reject null at specified alternative. Has power of for eect size. Pick sample size on this basis: (This is used in sample design) What value of to use? 21
22 Observe that two investigators with same but dierent sample size have dierent power. This is often ignored in empirical work. Why not equalize the power of the tests across samples? Why use the same size of test in all empirical work? 22
23 3 Alternative Approaches to Testing and Inference 3.1 Classical Hypothesis Testing (1) Appeals to long run frequencies. (2) Designs an ex ante rule that on average works well. e.g. 5% ofthetimeinrepeatedtrialswemakeanerror of rejecting the null for a 5% significance level. (3) Entails a hypothetical set of trials, and is based on a long run justification. 23
24 (4) Consistency of an estimator is an example of this mindset. E.g., = + ( ) 6= 0; OLS biased for. Suppose we have an instrument: ( ) =0 ( ) 6= 0 ( ) plim = + () ( ) plim = + = ( ) {z } =0 24
25 Another consistent estimator (1) Use for first observations (2) Then use IV Likely to have poor small sample properties. But on a long run frequency justification, its just fine. 25
26 3.2 Examples of why some people get very unhappy about classical procedures Example 1. Sample size: =2 ( 1 2 ) ( = 0 1) = ( = 0 +1)= 1 ; =12 2 One possible (smallest) confidence set for 0 is ( 1 2 )= ½ 1 2 ( ) if 1 6= if 1 = 2 26
27 Thus 75% of the time ( 1 2 ) contains 0 (75% of repeated trials it covers 0 ). (Verify this) Yet if 1 6= 2,wearecertain that the confidence interval exactly covers the true value. 100% ofthetimeitisright. Ex post or conditional on the data, we get the exact value. 27
28 Example 2. (D.R. Cox) (1) You have data, say on DNA from crime scenes. (2) You can send data to New York or California labs. Both labs seem equally good. (3) Toss a coin to decide which lab analyzes data. Should the coin flip be accounted for in the design of the test statistic? 28
29 Example 3. Test H 0 : θ = 1 vs. H A : θ =1. X N (θ,.25) Consider rejection region: Reject if X 0. If we observe X =0,wewouldhaveα =.0228 Size is.0228; power under alternative is Looks good, but if we reverse roles of null and alternative, it would also look good.
30 Probability Density Function Example 3 X N,0.25; H 0 : 1; H A : 1. Test : Reject H 0 if X 0. φ((x- μ 0 )/σ) 0.5 φ((x- μ 1 )/σ) x, μ A = 1, μ 0 = -1, σ 2 = 0.25 In the case of 0 being observed: Power
31 Example 4. {1 2 3}; wehavetwopossibleoutcomes 0 and Consider test: Accepts 0 when =3and accepts 1 otherwise ( = 01 and = 99 high power). 31
32 If we observe =1we reject. But the likelihood ratio in favor of 0 is =9 32
33 Example Reject 0 when =1 2 Power = 99, Size= 01. Is it reasonable to pick 1 over 0 when =1is chosen? (Likelihood ratio not strongly supporting the hypothesis) 33
34 Example 6. (Lindley and Phllips; American Statistician, August, 1976). Consider an experiment. We draw 12 balls from an urn. The urn has an infinite number of balls. = probability of black. (1 ) =probability of red. 34
35 Red and black are equally likely on each trial and trials are independent. μ 12 Pr ( is black) = (1 ) 12. Suppose that we draw 9 black balls and 3 red balls. What is the evidence in support of the hypothesis that = 1 2? 35
36 We might choose a critical region = { } to reject null of = 1 2 = Pr( { }) ½μ μ μ = We do not reject for = 05 + μ ¾ μ =75% 0 2 This sampling distribution assumes that 10 black and 2 reds is a possibility. (It is based on a counterfactual space of what else could occur and with what possibility). 36
37 Now consider an alternative sampling rule. Draw balls until 3 red balls are observed and then stop. (So 10 blacks and 2 reds on a trial of 12 not possible as they were before.) Distribution of 2 ( in this experiment) is μ (1 ) 3 2 Rejection region 2 = { } i.e., if 2 9, reject Pr( { }) =325% Now significant. Reject null of equality. 37
38 Now suppose you have in both cases 9 black and 3 red on a single trial. 38
39 In computing values and significance levels, you need to model what didn t occur. Depends on the stopping rule and the hypothetical admissible sample space. 39
40 3.3 Additional Problems with Classical Statistics Classical: Design of what could happen. Aects values. Trials with 3 results 1, 2, 3 (from Hacking) Hypotheses (1) (2) (3) vs vs vs Reject for i 1 occurs. This is not the most powerful test in fact, it is a biased test. (size = 01 forlessthanpower=0). 40
41 Power (from Hacking) (1) (2) (3) (4) Test 1: reject if 3 occurs; size =001, power=097. Test 2: reject i 1 or 2 occur; power= is inferior to 1 in power, but if 1 occurs, reject and then we know it has to be. 2 is a much better ex post test than 1. 41
42 3.4 Likelihood Principle All of the information is in the sample. Look at the likelihood as best summary of the sample. 42
43 Likelihood Approach Recall from our discussion of asymptotics Part III that under the regularity conditions of Thms. 2 and 3, because ˆ (ˆ) = ( 0 )+ 1 2 (ˆ 0 ) = = 0 for all ˆ; ( 0 ) = ln $( 0) where in the i.i.d. case: ln $(ˆ) 0 (ˆ 0 )+ (1) 43
44 In terms of the information matrix, for the likelihood case (ˆ) =( 0 ) 1 2 (ˆ 0 ) 0 0 (ˆ 0 )+ (1) So we know that as, the likelihood $ looks like a normal, e.g., 1 N ( ) = exp 1 (2) 2 2 ( )0 1 ( ) ln N ( ) = ln(2) 2 ln 1 2 ( )0 1 ( ) 44
45 So the likelihood is converging to a normal-looking criterion andhasitsmodeat 0. The most likely value is at the MLE estimator (mode of likelihood is 0 ). 45
46 3.5 Bayesian Principle Use prior information in conjunction with sample information. Place priors on parameters. Classical Method and Likelihood Principle sharply separate parameters from data (random variables). The Bayesian method does not. All parameters are random variables. 46
47 Bayesian and Likelihood approach both use likelihood. Likelihood: Use data from experiment (Evidence concentrates on 0 ) Bayesian: Use data from experiment plus prior. 47
48 Bayesian Approach postulates a prior (). This is a probability of Compute using posterior (Bayes Theorem): posterior z } { ( ) =TL( ) {z } likelihood prior z} { (), where T is a constant defined so posterior integrates to 1. Get some posterior independent of constants (and therefore sampling rule). 48
49 3.6 Definetti s Thm: Let denote a binary variable {0 1} i.i.d. Pr( =1)= Pr( =0)=1 Let ( ) =probability of = 1 and 0. Z 1 μ + ( ) = (1 ) () 0 For some () 0 (This is just the standard Hausdor moment problem). 49
50 For this problem a natural conjugate prior is () = 1 (1 ) 1 ( ) 0 1 = =1, uniform () = + 50
51 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =0.1 51
52 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =0.5 52
53 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =1 53
54 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =2 54
55 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =3 55
56 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =5 56
57 Beta Probability Density Function b = 0.1 b = 0.5 b = 1 b = 2 b = 3 b = 5 b = of Beta(,a,b), a =10 57
58 Beta Probability Density Function b = 1 b = 2 b = 3 b = 4 b = 5 b = 6 b = of Beta(,a,b), a = of Beta(,,)X ; ( = 1) ( = 1) 1 58
59 Beta Probability Density Function b = 1 b = 2 b = 3 b = 4 b = 5 b = 6 b = of Beta(,a,b), a = of Beta(,a,b) ; (a = 2) b (a = 2)
60 Beta Probability Density Function b = 1 b = 2 b = 3 b = 4 b = 5 b = 6 b = of Beta(,a,b), a =1 60
61 ( = 1) of Beta(,,)X ; ( = 1) 1 61
62 Beta Probability Density Function b = 1 b = 2 b = 3 b = 4 b = 5 b = 6 b = of Beta(,a,b), a =2 62
63 of Beta(,a,b) ; (a = 2) b (a = 2)
64 Posterior ( ) = (1 ) {z } likelihood 1 (1 ) 1 {z } prior where is the data and is a normalizing constant to make density normalize to one: Z (1 ) 1 (1 ) 1 =1 Observe crucially that the normalizing constant is the same for both sampling rules we discussed in the red ball and black ball problem. Why? Becausewechoose to make ( ) integrate to one. 64
65 Mean of posterior with prior posterior () = + ( + )+( + ) Notice: the constants that played such a crucial role in the sampling distribution play no role here. They vanish in defining the constant. mode of = 1 ( 1) + ( 1) Likelihood corresponds to ( + ) trialswith red and black. Prior corresponds to ( + 2) trialswith( 1) redand ( 1) black. 65
66 Empirical Bayes Approach Estimate Prior. Go to Beta-Binomial Example. ( ) = Z (1 ) 1 (1 ) 1 ( + ) Now is a heterogeneity parameter distributed ( ). = + ( ) ( + ) Estimate and as parameters from a string of trials with reds and blacks. is a person-specific parameter. 66
67 Similar idea in the linear regression model = +.(Use notes from the second day of classes for proof of identification.) (We can identify means and variances.) = + ( ) Assume. = + ( )= = +( + ) {z } 2 = Use squared OLS residuals to identify given. 67
68 Notice: We can extend the model to allow = + and identify (Hierarchical model). 68
69 Digression: Take the Classical Normal Linear Regression Model = + ( 0 )= 2 ˆ =( 0 ) 1 0 (ˆ) = 2 ( 0 ) 1 Assume 2 known. Take a prior on. N 2 () 1 Posterior is normal: posterior ³ N +( 0 ) 1 ³ 1 +( )ˆ 0 2 ( + 0 ) 1 69
70 Thus, we can think of the prior as a sample of observations with the ( 0 ) matrixbeing and the sample OLS from prior being. Compare to = + OLS is ( ) 1 ( ) =( 0 ) 1 0 =( 0 ) 1 0 (Prove this.) See Leamer for more general case where 2 is unknown (gamma prior). 70
71 To compute evidence on one hypothesis vs. another hypothesis use posterior odds ratio Pr( 1 ) Pr( 0 ) = Pr( 1) Pr( 0 ) Pr( 1 ) Pr( 0 ) Hypotheses are restrictions on the prior (e.g. dierent values of ( )) e.g. uniform vs. is equally likely. 71
72 Classical Approach Bayesian Approach Assumption regarding experiment Events independent, Events form exchangeable sequences given a probability Interpretation of probability Relative frequency; Degrees of belief; applies only to repeated events applies both to unique and to sequences of events Statistical inferences Based on sampling distribution; Based on posterior distribution; sample space or stopping rule prior distribution must be assessed must be specified Estimates of parameters Requires theory of estimation Descriptive statistics of the posterior distribution Intuitive judgement Used in setting significance levels, Formally incorporated in the in choice of procedure, and in other prior distribution ways Source: Lindley, D.V. and Phillips, L.D. (1976). Inference for a Bernoulli Process (A Bayesian View). American Statistician 30(3):
73 4 Bayesian Testing Point null vs. Point Alternative test (Leamer Example): Think of a regression model = vs. = Hypotheses: 1 0 Posterior odds ratio Pr ( 1 ) Pr ( 0 ) = Bayes factor Pr ( 1 ) Pr ( 0 ) Prior odds ratio Pr ( 1 ) Pr ( 0 ) 73
74 Predictive density : Z Z ( )= 2 2 Likelihood 2 Prior density Evidence supports the higher posterior probability model. 74
75 Example: ; 2 ; 2 0 : 0 =0 =1 1 : 1 =1 =1 0 : (0 1 ) 1 : (1 1 ) 75
76 Typical Neyman-Pearson Rule: Reject 0 if Accept 0 if Type 1 and Type 2 errors: () = Pr =0 () = Pr =1 Example: =05, = =
77 Bayes Approach Pr 0 = 0 Pr (0 ) = 0 Pr (0 ) 0 Pr (0 )+ 1 Pr (1 ) Pr 1 = 1 Pr (1 ) 0 Pr (0 )+ 1 Pr (1 ) 77
78 Pr 0 Pr 1 = 0 Pr (0 ) 1 Pr (1 ) = exp 1 h 2 + i 2 Pr ( 0 ) 1 2 Pr ( 1 ) = exp Pr ( 0 ) 2 Pr ( 1 ) = exp 1 Pr ( 0 ) 2 2 Pr ( 1 ) (Recall 2 =1under null and alternatives.) 78
79 2 Ã Pr 0! ln Pr 1 μ Pr (0 ) 1 2 +ln Pr ( 1 ) h i ln ³ Pr(0 ) Pr( 1 ) = ln μ Pr (0 ) Pr ( 1 ) (If true accept 0 ) 1 2 As gets big cut o changes with sample size unless Pr( 0 )= Pr( 1 )= 1 2 Notice that this is dierent from the classical statistical rule of a fixed cuto point. 79
80 Compare this to the fixed rule in example 3. This is a Bayesian version of example 3 with 0 : =0 instead of : =1. Instead of 0 : = 1 vs. : = 1. Note that we get the same rule in both cases when ( 0 )= ( 1 )=
81 Point Null vs. Composite Alternative Same set up as in previous case: ( 2 ). 0 : =0vs. : 6= 0. 2 is unspecified, but common across models. Turn Bayes Crank. Likelihood factor: ( ; 2 ) ( ;0 2 ) Relative likelihoods L =exp 2 2 h 2 2 i 81
82 What value of is best supported by data? Recall the likelihood approach: (Focuses on outcomes that are most likely.) L =exp 2 (2 ) 2 82
83 Relative Likelihood exp([y 2 - (Y- ) 2 ].T/(2 2 )) ; Y = 1, = 1, T = 1 83
84 value approach uses absolute likelihood not relative likelihood. In what sense is it most likely? Likelihood approach: Evaluate at null of =0and we get: L =exp = =1 1 2 q 2 2 {z } 2 for =0 This is an expression of support for the hypothesis: =0. Thus a big value leads to rejection of the null. But this approach does not worry about the alternative., 84
85 Frequency Theory or Sampling Approach. Look at sampling distributions of model Test statistic : centered at =0 () =Pr =0 e.g. 196 we reject. value: knife-edge value is the value that occurred value that favors null? At any level less than, nullhypothesisisnot rejected. 85
86 86
87 Significance level: Is what occurred unlikely? Relative likelihood is a relative idea (null vs. alternative). Support for one hypothesis vs. support for another. 87
88 Bayes Approach: Allocate positive probability to null. (Otherwise the probability of a point null =0). if =0 () (1 ) 0 ( ) 1 if 6= 0 {z } (0 1 ) 88
89 (Point mass): = Pr 1 Pr 0 = Z ³ (1 ) R () =02 h exp 2 6=0 ³ h exp 2 i i exp h () i 2 2 Complete the square in the numerator and integrate out. 89
90 Side manipulations: Look at numerator exp 2 ( )
91 Complete the square to reach: exp μ 2 exp ³ 2 = μ exp exp exp μ Ã ! + Ã 2 2 +! 2 91
92 Then integrate out the and we get (cancelling terms): = 1 0 = 1 1 " μ 2 exp μ μ 1 2 Ã!# μ 1+ Ã 1! 2 μ Ã! 2 exp {z } Bayes factor 92
93 = 1 Ã ! 1 2 exp " 2 2 Ã !# Notice that the higher ³ =,themorelikelywereject 0. However, as, for fixed, we get Pr 1 Notice t = for =0;thisis (1). Pr
94 We support 0 ( Lindley Paradox ) Bayesians use sample size to adjust critical region or rejection region. In classical case, we have that with fixed, the power of the test goes to 1. (It overweights the null hypothesis.) Issue: which weighting of and is better? 94
STATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationFrequentist Statistics and Hypothesis Testing Spring
Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing
ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationDiscrete Dependent Variable Models
Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationNotes on Decision Theory and Prediction
Notes on Decision Theory and Prediction Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico October 7, 2014 1. Decision Theory Decision theory is
More informationPhysics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester
Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationChapter Three. Hypothesis Testing
3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationUnobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:
Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed
More informationBayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke
State University of New York at Buffalo From the SelectedWorks of Joseph Lucke 2009 Bayesian Statistics Joseph F. Lucke Available at: https://works.bepress.com/joseph_lucke/6/ Bayesian Statistics Joseph
More informationSome Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)
Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Principal Topics The approach to certainty with increasing evidence. The approach to consensus for several agents, with increasing
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLabor Supply and the Two-Step Estimator
Labor Supply and the Two-Step Estimator James J. Heckman University of Chicago Econ 312 This draft, April 7, 2006 In this lecture, we look at a labor supply model and discuss various approaches to identify
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationStatistics 100A Homework 5 Solutions
Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to
More informationF79SM STATISTICAL METHODS
F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationAbility Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006
Ability Bias, Errors in Variables and Sibling Methods James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 1 1 Ability Bias Consider the model: log = 0 + 1 + where =income, = schooling,
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationLecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from
Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationPrinciples of Statistical Inference
Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationPrinciples of Statistical Inference
Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationLecture Testing Hypotheses: The Neyman-Pearson Paradigm
Math 408 - Mathematical Statistics Lecture 29-30. Testing Hypotheses: The Neyman-Pearson Paradigm April 12-15, 2013 Konstantin Zuev (USC) Math 408, Lecture 29-30 April 12-15, 2013 1 / 12 Agenda Example:
More informationHypothesis Testing - Frequentist
Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating
More information6.4 Type I and Type II Errors
6.4 Type I and Type II Errors Ulrich Hoensch Friday, March 22, 2013 Null and Alternative Hypothesis Neyman-Pearson Approach to Statistical Inference: A statistical test (also known as a hypothesis test)
More informationComparison of Bayesian and Frequentist Inference
Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationWhy Try Bayesian Methods? (Lecture 5)
Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationThe Bayesian Paradigm
Stat 200 The Bayesian Paradigm Friday March 2nd The Bayesian Paradigm can be seen in some ways as an extra step in the modelling world just as parametric modelling is. We have seen how we could use probabilistic
More informationUsing Probability to do Statistics.
Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More information18.05 Practice Final Exam
No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationIntroductory Econometrics. Review of statistics (Part II: Inference)
Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationexp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1
4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first
More informationMaximum-Likelihood Estimation: Basic Ideas
Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators
More informationA Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors
Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationBayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)
Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationInverse Sampling for McNemar s Test
International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationStatistics for the LHC Lecture 1: Introduction
Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationThe University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80
The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationBios 6649: Clinical Trials - Statistical Design and Monitoring
Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & nformatics Colorado School of Public Health University of Colorado Denver
More informationBayesian Statistics. Debdeep Pati Florida State University. February 11, 2016
Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More information