Multiple Sample Categorical Data
|
|
- Laura Bradford
- 5 years ago
- Views:
Transcription
1 Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro 1 / 25 Testing whether two dice have the same distribution Suppose we want to know whether two irregular 6-faced dice, with faces numbered 1 through 6 as usual, have the same chances of landing on any digit. NOTE: The question is not whether they are fair or not. To determine whether this is so, we throw the first die m = 500 times, obtaining X 1,...,X m {1,...,6}, and then throw the second die n = 500 times also, obtaining Y 1,...,Y n {1,...,6}. (We assume all the throws are independent of each other.) NOTE: In principle, m and n can be different, although for the same total sample size m+n, it is best to choose m = n if possible. We then test versus H 0 : the dice X and Y have the same distribution H 1 : the dice X and Y have different distributions 2 / 25 Summary statistics. The counts M s = #{i : X i = s}, s = 1,...,6 N s = #{i : Y i = s}, s = 1,...,6 are (jointly) sufficient, and can be displayed in a table as follows: Digit Total X Y Total Graphics. The plots of choice are the following. They offer different advantages. Segmented barplots side-by-side barplots
2 3 / 25 Chi-squared goodness-of-fit test The observed counts are M s = #{i : X i = s}, s = 1,...,6 N s = #{i : Y i = s}, s = 1,...,6 Under the null, X and Y have the same distribution, say p = (p 1,...,p 6 ), and the expected counts are E(M s ) = mp s E(N s ) = np s The issue is that we do not know p! (Compare with the one-sample setting.) The idea is to estimate p based on the combined sample: ˆp s = M s +N s m+n 4 / 25 With ˆp defined, we can then obtain estimated expected counts Ê(M s ) = mˆp s Ê(N s ) = nˆp s The final step is to compare the observed and estimated expected counts with the usual chi-squared test statistic: [ ] 6 (M s mˆp s ) 2 D = + (N s nˆp s ) 2 mˆp s nˆp s s=1 Theory. Under the null, D has asymptotically (m, n ) the chi-squared distribution with 6 1 = 5 degrees of freedom. Two or more dice The same methodology extends to compare the distributions any number k 2 dice with the same number of faces S 2. The sample sizes may be different. The estimated expected counts (under the null) are estimated based on all the samples combined. Theory. The resulting test statistic has asymptotically (as all the sample sizes diverge) the chi-squared distribution with (k 1)(S 1) degrees of freedom. 5 / 25 6 / 25
3 Testing whether two dice are independent of each other Suppose we now want to know whether, when rolling these dice together, the digits they show are independent. We throw the pair of dice together n = 500 times and record the outcomes, denoted (X 1,Y 1 ),...,(X n,y n ), with (X i,y i ) {1,...,6} {1,...,6}. (We assume the throws are independent.) In this setting, the variables X and Y (results from the two dice) are paired. We test versus H 0 : the dice X and Y are independent H 1 : the dice X and Y are not independent Known marginal distributions First, assume that we know that both dice are fair. (Each die might have been rigorously tested before based on many trials.) Under the null hypothesis, the dice are independent, we have: P((X,Y) = (a,b)) = P(X = a)p(y = b) = = 1, a,b {1,...,6} 36 7 / 25 We can simply apply the chi-squared GOF test to decide whether Z 1,...,Z n, where Z i = (X i,y i ), are uniformly distributed over {1,...,6} {1,...,6}. After all, the variable Z is just a factor, here with 36 levels, so we are in the one-sample categorical data situation! 8 / 25 Unknown marginal distributions Now assume that we do not know the distributions of the dice. (This situation is much more common.) Under the null hypothesis, the dice are independent, so that P((X,Y) = (a,b)) = P(X = a)p(y = b), a,b {1,...,6} But now we do not know the marginals P(X = a) or P(Y = b). 9 / 25
4 Contingency table Summary statistics. The joint counts are sufficient and used as summary statistics: N s,t = #{i : (X i,y i ) = (s,t)} They are organized in a matrix, called contingency table (here with totals): Graphics: The main plots are the segmented barplot the side-by-side barplot the mosaic plot Y X Sum Sum / 25 Chi-squared goodness-of-fit test The observed counts are N s,t = #{i : (X i,y i ) = (s,t)} Under the null, X and Y are independent, say with marginals p and q, and the expected counts are E(N s,t ) = np(x = s,y = t) = np(x = s)p(y = t) = np s q t The issue is that we do not know the marginals, neither p nor q. 11 / 25
5 The idea is to estimate p and q from the margins. Define the marginal counts as before and then the estimates N s, = #{i : X i = s} N,t = #{i : Y i = t} ˆp s = N s, n ˆq t = N,t n With ˆp and ˆq defined, we can then obtain estimated expected counts Ê(N s,t ) = nˆp s ˆq t = N s, N,t n 12 / 25 The final step is to compare the observed and estimated expected counts with the usual chi-squared test statistic: 6 6 (N s,t nˆp sˆq t ) 2 D = nˆp sˆq t s=1 t=1 Theory. Under the null, D has asymptotically (n ) the chi-squared distribution with (6 1)(6 1) = 25 degrees of freedom. 13 / 25 The same methodology extends to testing for independence between two factors with S and T levels, respectively. The margins are used in the same way to estimate the expected counts under the null. Theory. The resulting test statistic has asymptotically (n ) the chi-squared distribution with (S 1)(T 1) degrees of freedom. 14 / 25
6 Fisher s exact test R.A. Fisher (a great figure in statistics) developed an exact test for 2 x 2 contingency tables (meaning the two categorical variables are binary). He tells the following story ( lady tasting tea") to motivate his test. Here is the story (paraphrased): A British woman claimed to be able to distinguish whether milk or tea was added to the cup first. To test, she was given 8 cups of tea, in four of which milk was added first. The null hypothesis is that there is no association between the true order of pouring and the woman s guess, the alternative that there is a positive association (that the odds ratio is greater than 1). The resulting counts are as follows: Guess Milk Truth Tea Sum Milk Tea Sum / 25 The expected counts are too small to use the chi-squared approximation. What can we do? How can we quantify how accurate the lady s guess is? Fisher s idea is to fix the margins (meaning the row sums and the column sums), enumerate all the contingency tables with the same margins, and sum the probabilities of all the tables that are at least as extreme as the table that is observed. Enumerating all the tables with the observed margins is easy, since there is only one degree of freedom left. For example, we can focus on the top left cell, which determines all the other ones. A table here is at least as extreme as the observed table if the top left cell has a higher count (implying a stronger positive association). 16 / 25 Suppose we have a general 2 2 contingency table Y = 1 Y = 0 Sum X = 1 N 11 N 10 N 1 X = 0 N 01 N 00 N 0 Sum N 1 N 0 n When X and Y are independent, the probability of obtaining such a table, conditioned on having these margins, is ( )( ) ( ) N1 N0 n / N 1 N 11 Indeed, the top left cell count is hypergeometric. N 01
7 17 / 25 In our example, the probability of the observed table is ( )( ) ( ) / There is only one more extreme table Milk Tea Sum Milk Tea Sum and it has probability ( )( ) ( ) / The p-value is the sum of these: ( )( ) ( ) / ( 4 4 )( ) ( ) 4 8 / / 25 Exact testing for general S T tables The procedure extends to contingency tables of any dimensions. Assume the following are given row sums: (m s : s = 1,...,S) (1) column sums: (m t : t = 1,...,T) (2) The likelihood of drawing uniformly at random a table with these marginal sums M = (m st : s = 1,...,S;t = 1,...,T) is equal to S m s! s=1 where n is the sample size, meaning n = s ( T m t! / n! t=1 t m st. ) S T m st! s=1 t=1 19 / 25
8 In analogy with Fisher s exact test, we may define a table as being at least as extreme as the one we observe if its probability is at least as small as the probability of the one we observe. Alternatively, it may be defined as having a test statistic (e.g., Pearson s) at least as extreme as the statistic for the table we observe. The main issue is computational, as enumerating all tables with given margins may be prohibitive as their number increases very fast with the number of cells and the magnitude of the counts. 20 / 25 Calibration by permutation Fisher s method is based on the permutation distribution with the margins being fixed. Under the null hypothesis, X i and Y i are independent. In particular, for any permutation π of {1,...,n}, the permuted data (X 1,Y π1 ),...,(X n,y πn ) has the same distribution as the original data (X 1,Y 1 ),...,(X n,y n ) Therefore, under the null, any test statistic D = Λ [ (X 1,Y 1 ),...,(X n,y n ) ] has the same distribution after permutation, meaning for any permutation π, D π = Λ [ (X 1,Y π1 ),...,(X n,y πn ) ] has the same distribution as D under the null. 21 / 25 Suppose that we reject for large values of Λ, and define P = #{π : D π D obs } n! P is the fraction of permuted statistics that are as extreme as the one we have. P is a valid p-value, in the sense that P 0 (P p) p, p (0,1) In fact, if all the D π s are distinct, then under the null P is uniformly distributed over {k/n! : k = 1,...,n!}. 22 / 25
9 In practice, the number (n!) of permutations is too large to compute P exactly. In that case, we estimate P by Monte Carlo sampling. For B a large integer, sample π 1,...,π B iid uniform from the permutations of {1,...,n} and estimate P by It happens that ˆP is also a valid p-value. The parametric bootstrap ˆP = #{b : D π b D obs}+1 B +1 The bootstrap offers an alternative method for obtaining a p-value by simulation. It mimics Monte Carlo simulations, replacing the (unknown) marginals with the estimated marginals. Assume without loss of generality that X takes values in {1,..., S} and Y takes values in {1,..., T}. Let (p 1,...,p S ) denote the marginal distribution of X and (q 1,...,q T ) the marginal distribution of Y. 23 / 25 Let ˆp s denote the MLE for p s, meaning, Let ˆq t denote the MLE for q t, meaning, ˆp s = 1 n #{i : X i = s} ˆq t = 1 n #{i : Y i = t} 24 / 25 Suppose we are rejecting for large values of a test statistic D = Λ [ (X 1,Y 1 ),...,(X n,y n ) ] Let B be a large integer. 1. For b = 1,...,B, do the following: (a) Generate a sample of size n, X (b) 1,...,X(b) n, from (ˆp 1,..., ˆp S ). Generate a sample of size n, Y (b) 1,...,Y n (b), from (ˆq 1,...,ˆq T ). (b) Compute 2. The estimated p-value is D b = Λ [ (X (b) (b) 1,Y 1 ),...,(X n (b) #{b : D b D obs }+1 B +1,Y (b) n )] 25 / 25
One-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationBivariate Paired Numerical Data
Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationStatistics - Lecture 04
Statistics - Lecture 04 Nicodème Paul Faculté de médecine, Université de Strasbourg file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40 Correlation In many situations the objective
More informationMultiple Sample Numerical Data
Multiple Sample Numerical Data Analysis of Variance, Kruskal-Wallis test, Friedman test University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 /
More informationLing 289 Contingency Table Statistics
Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More informationLecture 10: Generalized likelihood ratio test
Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationProbability Basics. Part 3: Types of Probability. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder
Probability Basics Part 3: Types of Probability INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder September 30, 2016 Prof. Michael Paul Prof. William Aspray Example A large government
More informationTUTORIAL 8 SOLUTIONS #
TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationElements of probability theory
The role of probability theory in statistics We collect data so as to provide evidentiary support for answers we give to our many questions about the world (and in our particular case, about the business
More informationj=1 π j = 1. Let X j be the number
THE χ 2 TEST OF SIMPLE AND COMPOSITE HYPOTHESES 1. Multinomial distributions Suppose we have a multinomial (n,π 1,...,π k ) distribution, where π j is the probability of the jth of k possible outcomes
More informationThere are statistical tests that compare prediction of a model with reality and measures how significant the difference.
Statistical Methods in Business Lecture 11. Chi Square, χ 2, Goodness-of-Fit Test There are statistical tests that compare prediction of a model with reality and measures how significant the difference.
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More information2.3 Analysis of Categorical Data
90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results
More informationChapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments
Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random
More informationSTA 247 Solutions to Assignment #1
STA 247 Solutions to Assignment #1 Question 1: Suppose you throw three six-sided dice (coloured red, green, and blue) repeatedly, until the three dice all show different numbers. Assuming that these dice
More informationBinary response data
Binary response data A Bernoulli trial is a random variable that has two points in its sample space. The two points may be denoted success/failure, heads/tails, yes/no, 0/1, etc. The probability distribution
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationE509A: Principle of Biostatistics. GY Zou
E509A: Principle of Biostatistics (Week 4: Inference for a single mean ) GY Zou gzou@srobarts.ca Example 5.4. (p. 183). A random sample of n =16, Mean I.Q is 106 with standard deviation S =12.4. What
More informationProbability & Statistics - FALL 2008 FINAL EXAM
550.3 Probability & Statistics - FALL 008 FINAL EXAM NAME. An urn contains white marbles and 8 red marbles. A marble is drawn at random from the urn 00 times with replacement. Which of the following is
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationAre Declustered Earthquake Catalogs Poisson?
Are Declustered Earthquake Catalogs Poisson? Philip B. Stark Department of Statistics, UC Berkeley Brad Luen Department of Mathematics, Reed College 14 October 2010 Department of Statistics, Penn State
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationQuantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October
More informationThe University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80
The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple
More informationSequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk
Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationOptimal rejection regions for testing multiple binary endpoints in small samples
Optimal rejection regions for testing multiple binary endpoints in small samples Robin Ristl and Martin Posch Section for Medical Statistics, Center of Medical Statistics, Informatics and Intelligent Systems,
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationDefinition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics
What is Statistics? Definition of Statistics Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make a decision. Branches of Statistics The study of statistics
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationFormulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.
Formulas and Tables for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. Ch. 2: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s Å n 2 1 n(sx 2 ) 2 (Sx) 2 s Å n(n 2 1) Mean (frequency
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain
More informationIntroduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33
Introduction 1 STA442/2101 Fall 2016 1 See last slide for copyright information. 1 / 33 Background Reading Optional Chapter 1 of Linear models with R Chapter 1 of Davison s Statistical models: Data, and
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationFormulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion
Formulas and Tables for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. Ch. 3: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s Å n 2 1 n(sx 2 ) 2 (Sx)
More informationList of Symbols, Notations and Data
List of Symbols, Notations and Data, : Binomial distribution with trials and success probability ; 1,2, and 0, 1, : Uniform distribution on the interval,,, : Normal distribution with mean and variance,,,
More informationProbability and Statistics
Probability and Statistics Jane Bae Stanford University hjbae@stanford.edu September 16, 2014 Jane Bae (Stanford) Probability and Statistics September 16, 2014 1 / 35 Overview 1 Probability Concepts Probability
More informationApplied Statistics Lecture Notes
Applied Statistics Lecture Notes Kosuke Imai Department of Politics Princeton University February 2, 2008 Making statistical inferences means to learn about what you do not observe, which is called parameters,
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationTest of Association between Two Ordinal Variables while Adjusting for Covariates
Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationMTH135/STA104: Probability
MTH5/STA4: Probability Homework # Due: Tuesday, Dec 6, 5 Prof Robert Wolpert Three subjects in a medical trial are given drug A After one week, those that do not respond favorably are switched to drug
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationReview: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark
More informationProbability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics
Probability Rules MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Introduction Probability is a measure of the likelihood of the occurrence of a certain behavior
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationLecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?
Lecture 1 (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 (2.1 --- 2.6). Chapter 1 1. What is Statistics? 2. Two definitions. (1). Population (2). Sample 3. The objective of statistics.
More informationCONTINUOUS RANDOM VARIABLES
the Further Mathematics network www.fmnetwork.org.uk V 07 REVISION SHEET STATISTICS (AQA) CONTINUOUS RANDOM VARIABLES The main ideas are: Properties of Continuous Random Variables Mean, Median and Mode
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationWeldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)
Weldon s dice Weldon s dice Lecture 15 - χ 2 Tests Sta102 / BME102 Colin Rundel March 6, 2015 Walter Frank Raphael Weldon (1860-1906), was an English evolutionary biologist and a founder of biometry. He
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More informationIntroduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]
Introduction to Statistical Analysis Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core] 2 Timeline 9:30 Morning I I 45mn Lecture: data type, summary
More informationPHYS 275 Experiment 2 Of Dice and Distributions
PHYS 275 Experiment 2 Of Dice and Distributions Experiment Summary Today we will study the distribution of dice rolling results Two types of measurement, not to be confused: frequency with which we obtain
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationSection 3 : Permutation Inference
Section 3 : Permutation Inference Fall 2014 1/39 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random We will follow the notation of
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions
Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions 1999 Prentice-Hall, Inc. Chap. 4-1 Chapter Topics Basic Probability Concepts: Sample
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More information(c) P(BC c ) = One point was lost for multiplying. If, however, you made this same mistake in (b) and (c) you lost the point only once.
Solutions to First Midterm Exam, Stat 371, Fall 2010 There are two, three or four versions of each problem. The problems on your exam comprise a mix of versions. As a result, when you examine the solutions
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationChapter 11. Hypothesis Testing (II)
Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let
More informationTesting for Poisson Behavior
Testing for Poisson Behavior Philip B. Stark Department of Statistics, UC Berkeley joint with Brad Luen 17 April 2012 Seismological Society of America Annual Meeting San Diego, CA Quake Physics versus
More informationExam 2 Practice Questions, 18.05, Spring 2014
Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order
More information18.05 Practice Final Exam
No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For
More informationDiscrete Random Variables
Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan Introduction The markets can be thought of as a complex interaction of a large number of random processes,
More information