2.3 Analysis of Categorical Data

Size: px
Start display at page:

Download "2.3 Analysis of Categorical Data"

Transcription

1 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results from experiments consisting of n trials with k possible outcomes per trial where k 2. For k = 2 it is the binomial variable. Examples of multinomial distributions come from the experiments where we have several categories, such as blood types 0, A, B, AB (discrete rv) or income ranges [0.5K), [5K,10K), [10K, 20K),... (continuous rv). A multinomial experiment has the following properties: 1. The experiment consists ofnidentical trials. 2. The outcome of each trial falls into one ofk classes (cells, categories). 3. The probability p i that the outcome of a single trial falls into categoryiis constant. 4. The trials are independent. 5. The random variables of interest are Y = (Y 1,...,Y k ), where Y i denotes the number of trials whose outcomes fall into categoryi. Note that and p p k = 1 Y Y k = n.

2 2.3. ANALYSIS OF CATEGORICAL DATA 91 Definition 2.8. The random variablesy = (Y 1,...,Y k ) have multinomial distribution with parametersnandp 1,...,p k if the joint pmf fory is given by p Y (y 1,...,y k ) = n! y 1!y 2! y k! py 1 1 py 2 2 py k k, wherep i > 0, k i=1 p i = 1 and k i=1 y i = n andy i = 0,1,2,...,n. It is easy to notice that the marginal distribution of Y i is binomial with parametersnandp i. If we merge all categories except category i then each outcome of a trial will fall into cell i or the other merged cell. So, we have either success or failure with probability p i and 1 p i respectively. Then, E(Y i ) = np i, and var(y i ) = np i (1 p i ) Chi-Square Goodness of Fit Tests Fully Specified Distribution Suppose we are intersected in testing a hypothesis that the cell probabilities take some specified values, that is, H 0 : p 1 = p 10, p 2 = p 20,...,p k = p k0 H 1 : H 0 where H 0 means the null hypothesis is not true. For given n, underh 0, we have a fully specified multinomial distribution and the expectations of the numbers falling into the categories are E(Y i ) = np i, for i = 1,...,k. Karl Pearson derived a test statistic based on the assumption that if H 0 is true, then the random variablesy i should not differ much from

3 92 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING their expected values. The test statistic is given in the following theorem. Theorem 2.3. The statistic X 2 = k (Y i np i ) 2 i=1 has, asymptotically, the χ 2 k 1 distribution. Proof. We will show the result fork = 2. Then, X 2 = (Y 1 np 1 ) 2 np 1 + (Y 2 np 2 ) 2 np 2 = (Y 1 np 1 ) 2 = (Y 1 np 1 ) 2 + [ Y 1 +np 1 ] 2 np 1 n(1 p 1 ) ( 2 Y 1 np 1 =. np1 (1 p 1 )) np i (2.2) + [n Y 1 n(1 p 1 )] 2 np 1 n(1 p 1 ) = (Y 1 np 1 ) 2 np 1 (1 p 1 ) By the Central Limit Theorem, we have that for large n the standardized random variable Y 1 has, approximately, a standard normal distribution, i.e., Y 1 np 1 np1 (1 p 1 ) N(0,1). That is, ( 2 Y 1 np 1 χ np1 (1 p 1 )) 2 1, approximately. As a rejection region for such a test we choose the right hand side tail of the chi-squared distribution. This is because a small value of the test function, close to zero, would not contradict the null hypothesis as it would mean that the values of the rvs would not be far from their expectations.

4 2.3. ANALYSIS OF CATEGORICAL DATA 93 Example According to genetic theory the seeds collected from a field of pink pea should produce plants with white, pink or red flowers in the proportion 1:2:1. Of 400 plants grown from such seeds, 93 had white flowers, 211 had pink flowers and 96 had red flowers. Do these results contradict the genetic theory? Let X denote a random variable with the discrete distribution given by 1, if i = 1; 4 1 P(X = i) = 2, if i = 2;, if i = 3, where i = 1,2,3 denotes white, pink and red colour, respectively. Here we have a fully specified distribution. The question is whether the data give evidence against this distribution. If this distribution is true, then the expected numbers of pea plants with white, pink and red flowers, respectively, are n 1 4, n1 2 and n1 4. In the experiment n = 400 plants were observed, hence we should expect the numbers to be 100, 200 and 100. Denote the cumulative distribution function of this rv by F 0. Then, we can write the null and alternative hypotheses are 1 4 H 0 : F(x) = F 0 (x) for all x H 1 : H 0 The observed and expected values are often put together in the frequency table. Here, we have Category White Pink Red Observed Frequency Expected Frequency The value of test function is X 2 obs = (93 100) ( ) (96 100)2 100 =

5 94 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING For α = 0.1 we have χ 2 2;0.1 = Hence, there is no evidence to reject the null hypothesis. The experimental data do not contradict the hypothesis that the proportions of the white, pink and red flowers should be 1:2:1. Families of Distributions The chi-square goodness of fit tests are also used to verify hypotheses about families of distributions. For example, we may need to check that a random variable comes from a normal population (a common assumption of many tests) or that a Poisson distribution may be used to model some observations. These tests are also based on categories, which may be sets of values in a discrete case or real intervals in a continuous case. However, the probabilities that the variable is in a given category now depend on unknown parameters of the distribution which we test. In such situation we replace the parameters with their estimates and use the following chi-square test function. X 2 = k (Y i np i ( ϑ)) 2 i=1 np i ( ϑ) χ 2 k p 1, approximately, (2.3) where p is the number of estimated parametersϑ = (ϑ 1,...,ϑ p ). Example Discrete random variable. The number of accidents per week at an intersection was checked forn = 50 weeks with the results given in the table below. y or more Observed Frequency

6 2.3. ANALYSIS OF CATEGORICAL DATA 95 Test the hypothesis that the random variable Y has a Poisson distribution, assuming that the observations are independent. The null and alternative hypotheses are H 0 : Y Poisson(λ) H 1 : H 0 If the null hypothesis is true, than the pmf is P(Y = y;λ) = λy e λ, y = 0,1,2,... y! To calculate the estimates of expected frequencies np i ( λ) we need to calculate the estimate ofλ. We know that a good estimator ofλis the sample mean Y. It is so called Maximum Likelihood Estimator, which best evaluates the parameter for a given data set. Here we have λ obs = 1 50 ( ( ) ) = = 0.48 Note that the fourth category has no entries. In fact, the categories with very small numbers of observations need to be merged with the neighboring cells so that no category has less than five entries. In this case we merge the last cell with the third one. Under the null hypothesis, the probabilities for these cells are p 1 (λ) = P(Y = 0) = e λ, p 2 (λ) = P(Y = 1) = λe λ, p 3 (λ) = 1 p 1 (λ) p 2 (λ) Thus, for the given data we obtain the following estimates of the

7 96 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING expected frequencies: n p 1 ( λ obs ) = 50 e 0.48 = = 30.95, n p 2 ( λ obs ) = e 0.48 = = 14.85, n p 3 ( λ obs ) = 50 ( ) = This gives the frequency table y or more Observed Frequency Estimated Expected Frequency and the value of test function (2.3) with k p 1 = = 1 degree of freedom X 2 obs = ( ) ( ) (6 4.20) = The upper 100α% point of the chi-square distribution with one degree of freedom at α = 0.1 is χ 2 1;0.1 = Hence, there is no evidence in the data to reject the null hypothesis, which says that the number of accidents at the junction follows a Poisson distribution. Example Continuous random variable. An astronomer is interested in the numbers of cloudless nights at a prospective telescope site. He got the average value of cloudless nights over the last 87 years equal to x = and the estimate of the sample variance s 2 = Also, he has available counts of years of numbers of cloudless nights given in intervals presented below.

8 2.3. ANALYSIS OF CATEGORICAL DATA 97 Observed Interval Frequency y i 160 or below or above 2 The question is whether X - the number of cloudless nights at the site - can be modelled by a normal distribution. That is the null hypothesis is H 0 : X N(µ,σ 2 ). We can use the test function (2.3), but some of the classes have too few entries. Hence, we will combine first three cells and also the last two cells. Also, to obtain estimates of the expected frequencies, we need to calculate the probabilities of X belonging to each class, that is ( ai X P(a i < X < b i ) = P < Z < b ) ( ) ( i X bi X ai X Φ Φ S S S S where a i and b i denote the limits of class i and Φ(z) is a cdf of Z N(0,1). For example ( ( ) P(200 < X < 220) = Φ ) Φ = 1 Φ(0.7717) (1 Φ(1.4354)) = Φ(1.4354) Φ(0.7717)= = ),

9 98 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING This gives values p i and so it gives estimates of the expected frequenciesn p i for each class, as shown in the table below. Observed Estimated Estimated Interval frequencies cell probabilities frequencies y i p i n p i 200 or below or above The observed value of the test statistic is Xobs 2 = There are three degrees of freedom and the rejection region at α = 0.05 level of significance is (7.815, ) while at α = 0.01 it is (11.34, ). Hence, the data give some evidence against the null hypothesis, but it is not strong evidence as at α = 0.01 the value of test statistic is not in the rejection region.

TUTORIAL 8 SOLUTIONS #

TUTORIAL 8 SOLUTIONS # TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...

More information

Lecture 3. Discrete Random Variables

Lecture 3. Discrete Random Variables Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Math 152. Rumbos Fall Solutions to Exam #2

Math 152. Rumbos Fall Solutions to Exam #2 Math 152. Rumbos Fall 2009 1 Solutions to Exam #2 1. Define the following terms: (a) Significance level of a hypothesis test. Answer: The significance level, α, of a hypothesis test is the largest probability

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

4.5.1 The use of 2 log Λ when θ is scalar

4.5.1 The use of 2 log Λ when θ is scalar 4.5. ASYMPTOTIC FORM OF THE G.L.R.T. 97 4.5.1 The use of 2 log Λ when θ is scalar Suppose we wish to test the hypothesis NH : θ = θ where θ is a given value against the alternative AH : θ θ on the basis

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009 S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009 SECTION 1 The binomial and Poisson distributions. Students will be expected to use these distributions to model a real-world

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 4 Statistical Models in Simulation Purpose & Overview The world the model-builder sees is probabilistic rather than deterministic. Some statistical model

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Example. χ 2 = Continued on the next page. All cells

Example. χ 2 = Continued on the next page. All cells Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

3.4. The Binomial Probability Distribution

3.4. The Binomial Probability Distribution 3.4. The Binomial Probability Distribution Objectives. Binomial experiment. Binomial random variable. Using binomial tables. Mean and variance of binomial distribution. 3.4.1. Four Conditions that determined

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution Probability distributions Probability Distribution Functions G. Jogesh Babu Department of Statistics Penn State University September 27, 2011 http://en.wikipedia.org/wiki/probability_distribution We discuss

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

One-Way Tables and Goodness of Fit

One-Way Tables and Goodness of Fit Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the

More information

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2)

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2) NOTE : For the purpose of review, I have added some additional parts not found on the original exam. These parts are indicated with a ** beside them Statistics 224 Solution key to EXAM 2 FALL 2007 Friday

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

j=1 π j = 1. Let X j be the number

j=1 π j = 1. Let X j be the number THE χ 2 TEST OF SIMPLE AND COMPOSITE HYPOTHESES 1. Multinomial distributions Suppose we have a multinomial (n,π 1,...,π k ) distribution, where π j is the probability of the jth of k possible outcomes

More information

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Section VII. Chi-square test for comparing proportions and frequencies. F test for means Section VII Chi-square test for comparing proportions and frequencies F test for means 0 proportions: chi-square test Z test for comparing proportions between two independent groups Z = P 1 P 2 SE d SE

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

2.5.3 Generalized likelihood ratio tests

2.5.3 Generalized likelihood ratio tests 25 HYPOTHESIS TESTING 127 253 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : ϑ Θ against H 1 : ϑ Θ\Θ It can be used

More information

1.6 Families of Distributions

1.6 Families of Distributions Your text 1.6. FAMILIES OF DISTRIBUTIONS 15 F(x) 0.20 1.0 0.15 0.8 0.6 Density 0.10 cdf 0.4 0.05 0.2 0.00 a b c 0.0 x Figure 1.1: N(4.5, 2) Distribution Function and Cumulative Distribution Function for

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Random Variables Example:

Random Variables Example: Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014 Lecture 13 Text: A Course in Probability by Weiss 5.5 STAT 225 Introduction to Probability Models February 16, 2014 Whitney Huang Purdue University 13.1 Agenda 1 2 3 13.2 Review So far, we have seen discrete

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. IE 230 Seat # Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. Score Exam #3a, Spring 2002 Schmeiser Closed book and notes. 60 minutes. 1. True or false. (for each,

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Mathematical Statistics 1 Math A 6330

Mathematical Statistics 1 Math A 6330 Mathematical Statistics 1 Math A 6330 Chapter 3 Common Families of Distributions Mohamed I. Riffi Department of Mathematics Islamic University of Gaza September 28, 2015 Outline 1 Subjects of Lecture 04

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

10: Crosstabs & Independent Proportions

10: Crosstabs & Independent Proportions 10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church

More information

F79SM STATISTICAL METHODS

F79SM STATISTICAL METHODS F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

Hypothesis Testing One Sample Tests

Hypothesis Testing One Sample Tests STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal

More information

CONTINUOUS RANDOM VARIABLES

CONTINUOUS RANDOM VARIABLES the Further Mathematics network www.fmnetwork.org.uk V 07 REVISION SHEET STATISTICS (AQA) CONTINUOUS RANDOM VARIABLES The main ideas are: Properties of Continuous Random Variables Mean, Median and Mode

More information

Advanced Herd Management Probabilities and distributions

Advanced Herd Management Probabilities and distributions Advanced Herd Management Probabilities and distributions Anders Ringgaard Kristensen Slide 1 Outline Probabilities Conditional probabilities Bayes theorem Distributions Discrete Continuous Distribution

More information

INSTITUTE OF ACTUARIES OF INDIA

INSTITUTE OF ACTUARIES OF INDIA INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS 13 th May 2008 Subject CT3 Probability and Mathematical Statistics Time allowed: Three Hours (10.00 13.00 Hrs) Total Marks: 100 INSTRUCTIONS TO THE CANDIDATES

More information

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer.

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer. No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer. Name: Question: 1 2 3 4 Total Points: 30 20 20 40 110 Score: 1. The following numbers x i, i = 1,...,

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!!  (preferred!)!! Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive

More information

Reading Material for Students

Reading Material for Students Reading Material for Students Arnab Adhikari Indian Institute of Management Calcutta, Joka, Kolkata 714, India, arnaba1@email.iimcal.ac.in Indranil Biswas Indian Institute of Management Lucknow, Prabandh

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

STAT 509 Section 3.4: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

STAT 509 Section 3.4: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. STAT 509 Section 3.4: Continuous Distributions Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. A continuous random variable is one for which the outcome

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being

More information