Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries
|
|
- Jerome Cuthbert Ross
- 6 years ago
- Views:
Transcription
1 Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL Course Designed by Marco Baroni 1 and Stefan Evert 1 Center for Mind/Brain Sciences (CIMeC) University of Trento, Italy Corpus Linguistics Group Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 1 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 3 / 33 for continuous data Goal: infer (characteristics of) population distribution from small random sample, or test hypotheses about population problem: overwhelmingly infinite coice of possible distributions can estimate/test characteristics such as mean µ and s.d. but H 0 doesn t determine a unique sampling distribution then parametric model, where the population distribution of a r.v. X is completely determined by a small set of parameters In this session, we assume a Gaussian population distribution estimate/test parameters µ and of this distribution sometimes a scale transformation is necessary (e.g. lognormal) Nonparametric tests need fewer assumptions, but... cannot test hypotheses about µ and (instead: median m, IQR = inter-quartile range, etc.) more complicated and computationally expensive procedures correct interpretation of results often difficult SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 4 / 33
2 for continuous data Rationale similar to binomial test for frequency data: measure observed statistic T in sample, which is compared against its expected value E 0 [T ] if difference is large enough, reject H 0 Question 1: What is a suitable statistic? depends on null hypothesis H0 large difference T E 0 [T ] should provide evidence against H 0 e.g. unbiased estimator for population parameter to be tested Question : what is large enough? reject if difference is unlikely to arise by chance need to compute sampling distribution of T under H0 for continuous data Easy if statistic T has a Gaussian distribution T N(µ, ) µ and are determined by null hypothesis H 0 reject H 0 at two-sided significance level α =.05 if T < µ 1.96 or T > µ This suggests a standardized z-score as a measure of extremeness: Z := T µ Central range of sampling variation: Z 1.96 g(t) t µ SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 5 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 6 / 33 Notation for random samples Random sample of n m = Ω items e.g. participants of survey, Wikipedia sample,... recall importance of completely random selection Sample described by observed values of r.v. X, Y, Z,...: x 1,..., x n ; y 1,..., y n ; z 1,..., z n specific items ω 1,..., ω n are irrelevant, we are only interested in their properties x i = X (ω i ), y i = Y (ω i ), etc. Mathematically, x i, y i, z i are realisations of random variables X 1,..., X n ; Y 1,..., Y n ; Z 1,..., Z n X 1,..., X n are independent from each other and each one has the same distribution X i X i.i.d. random variables each random experiment now yields complete sample of size n SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 7 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 8 / 33
3 A simple test for the mean The sampling distribution of X Consider simplest possible H 0 : a point hypothesis H 0 : µ = µ 0, = 0 together with normality assumption, population distribution is completely determined How would you test whether µ = µ 0 is correct? An intuitive test statistic is the sample mean x = 1 n x i with x µ 0 under H 0 Reject H 0 if difference x µ 0 is sufficiently large need to work out sampling distribution of X The sample mean is also a random variable: X = 1 ( ) X1 + + X n n X is a sensible test statistic for µ because it is unbiased: [ ] E[ X 1 ] = E X i = 1 E[X i ] = 1 µ = µ n n n An important property of the Gaussian distribution: if X N(µ 1, 1 ) and Y N(µ, ) are independent, then X + Y N(µ 1 + µ, 1 + ) r X N(rµ 1, r 1) for r R SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 9 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 10 / 33 The sampling distribution of X The z test Since X 1,..., X n are i.i.d. with X i N(µ, ), we have X X n N(nµ, n ) X = 1 ( ) X1 + + X n N(µ, n n ) X has Gaussian distribution with same mean µ but smaller s.d. than the original r.v. X : X = / n explains why normality assumptions are so convenient larger samples allow more reliable hypothesis tests about µ If the sample size n is large enough, X = / n 0 and the sample mean x becomes an accurate estimate of the true population value µ (law of large numbers) SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 11 / 33 Now we can quantify the extremeness of the observed value x, given the null hypothesis H 0 : µ = µ 0, = 0 z = x µ 0 X = x µ 0 0 / n Corresponding r.v. Z has a standard normal distribution if H 0 is correct: Z N(0, 1) We can reject H 0 at significance level α if α = z > qnorm(α/) Two problems of this approach: 1. need to make hypothesis about in order to test µ = µ 0. H 0 might be rejected because of 0 even if µ = µ 0 is true SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 1 / 33
4 A test for the variance An intuitive test statistic for is the error sum of squares V = (X 1 µ) + + (X n µ) Squared error (X µ) is on average E[V ] = n reject = 0 if V n0 (variance larger than expected) reject = 0 if V n0 (variance smaller than expected) sampling distribution of V shows if difference is large enough Rewrite V in the following way: [ (X1 ) V = µ + + = (Z Z n ) ( ) ] Xn µ with Z i N(0, 1) i.i.d. standard normal variables SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 13 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 14 / 33 A test for the variance A test for the variance Note that the distribution of Z1 + + Z n does not depend on the population parameters µ and (unlike V ) Statisticians have worked out the distribution of n Z i for i.i.d. Z i N(0, 1), known as the chi-squared distribution Z i χ n with n degrees of freedom (df = n) The χ n distribution has expectation E [ i Z i ] = n and variance Var [ i Z i ] = n confirms E[V ] = n Under H 0 : = 0, we have V 0 = Z Z n χ n Appropriate rejection thresholds for the test statistic V /0 can easily be obtained with R χ n distribution is not symmetric, so one-sided tail probabilities are used (with α = α/ for two-sided test) Again, there are two problems: 1. need to make hypothesis about µ in order to test = 0. H 0 easily rejected for µ µ 0, even though = 0 may be true SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 15 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 16 / 33
5 Intermission: Distributions in R Intermission: Distributions in R R can compute density functions and tail probabilities or generate random numbers for a wide range of distributions Systematic naming scheme for such functions: dnorm() pnorm() qnorm() rnorm() density function of Gaussian (normal) distribution tail probability quantile = inverse tail probability generate random numbers Available distributions include Gaussian (norm), chi-squared (chisq), t (t), F (f), binomial (binom), Poisson (pois),... you will encounter many of them later in the course Each function accepts distribution-specific parameters > x <- rnorm(50, mean=100, sd=15) # random sample of 50 IQ scores > hist(x, freq=false, breaks=seq(45,155,10)) # histogram > xg <- seq(45, 155, 1) # theoretical density in steps of 1 IQ point > yg <- dnorm(xg, mean=100, sd=15) > lines(xg, yg, col="blue", lwd=) # What is the probability of an IQ score above 150? # (we need to compute an upper tail probability to answer this question) > pnorm(150, mean=100, sd=15, lower.tail=false) # What does it mean to be among the bottom 5% of the population? > qnorm(.5, mean=100, sd=15) # inverse tail probability SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 17 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 18 / 33 Intermission: Distributions in R The sample variance # Now do the same for a chi-squared distribution with 5 degrees of freedom # (hint: the parameter you re looking for is df=5) > xc <- seq(0, 10,.1) > yc <- dchisq(xc, df=5) > plot(xc, yc, type="l", col="blue", lwd=) Idea: replace true µ by sample value X (which is a r.v.!) V = (X 1 X ) + + (X n X ) But there are two problems: X i X N(0, ) not guaranteed because X µ terms are no longer i.i.d. because X depends on all X i # tail probability for i Z i 10 > pchisq(10, df=5, lower.tail=false) # What is the appropriate rejection criterion for a variance test with α = 0.05? > qchisq(.05, df=5, lower.tail=false) # two-sided: V / 0 > n > qchisq(.05, df=5, lower.tail=true) # two-sided: V / 0 < n SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 19 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 0 / 33
6 The sample variance The sample variance We can easily work out the distribution of V for n = : V = (X 1 X ) + (X X ) = (X 1 X 1+X ) + (X X 1+X ) = ( X 1 X ) + ( X X 1 ) = 1 (X 1 X ) where X 1 X N(0, ) for i.i.d. X 1, X N(µ, ) Can also show that V and X are independent follows from independence of X 1 X and X 1 + X this is only the case for independent Gaussian variables (Geary 1936, p. 178) SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 1 / 33 We now have ( ) V = X1 X = Z with Z χ 1 because of X 1 X N(0, ) For n > it can be shown that V = n 1 (X i X ) = j=1 with j Z j χ n 1 independent from X proof based on multivariate Gaussian and vector algebra notice that we lose one degree of freedom because one parameter (µ x) has been estimated from the sample SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org / 33 Z j Sample variance and the chi-squared test Sample data for this session This motivates the following definition of sample variance S S = 1 n 1 (X i X ) with sampling distribution (n 1)S / χ n 1 S is an unbiased estimator of variance: E[S ] = We can use S to test H 0 : = 0 without making any assumptions about the true mean µ chi-squared test # Let us take a reproducible sample from the population of Ingary > library(sigil) > Census <- simulated.census() > Survey <- Census[1:100, ] # We will be testing hypotheses about the distribution of body heights > x <- Survey$height # sample data: n items > n <- length(x) Remarks sample variance ( 1 n 1 ) vs. population variance ( 1 m ) χ distribution doesn t have parameters etc., so we need to specify the distribution of S in a roundabout way independence of S and X will play an important role later SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 3 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 4 / 33
7 Chi-squared test of variance in R # Chi-squared test for a hypothesis about the s.d. (with unknown mean) # H 0 : = 1 (one-sided test against > 0 ) > sigma0 <- 1 # you can also use the name 0 in a Unicode locale > S <- sum((x - mean(x))^) / (n-1) # unbiased estimator of > S <- var(x) # this should give exactly the same value > X <- (n-1) * S / sigma0^ # has χ distribution under H 0 > pchisq(x, df=n-1, lower.tail=false) # How do you carry out a one-sided test against < 0? # Here s a trick for an approximate two-sided test (try e.g. with 0 = 0) > alt.higher <- S > sigma0^ > * pchisq(x, df=n-1, lower.tail=!alt.higher) SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 5 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 6 / 33 for the mean for the mean Now we have the ingredients for a test of H 0 : µ = µ 0 that does not require knowledge of the true variance In the z-score for X Z = X µ 0 / n replace the unknown true s.d. by the unbiased sample estimate ˆ = S, resulting in a so-called t-score: T = X µ 0 S /n William S. Gosset worked out the precise sampling distriution of T and published it under the pseudonym Student Because X and S are independent, we find that T t n 1 under H 0 : µ = µ 0 Student s t distribution with df = n 1 degrees of freedom In order to carry out a one-sample t test, calculate the statistic t = x µ 0 s /n and reject H 0 : µ = µ 0 if t > C Rejection threshold C depends on df = n 1 and desired significance level α (in R: -qt(α/, n 1)) very close to z-score thresholds for n > 30 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 7 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 8 / 33
8 The mathematical magic behind Student s t distribution characterizes the quantity Z V /k t k where Z N(0, 1) and V χ k are independent r.v. T t n 1 under H 0 : µ = µ 0 because the unknown population variance cancels out between the independent r.v. X and S T = X X µ µ 0 0 S /n = with Z = X µ 0 / n = S n X µ 0 / n S = N(0, 1) and V = (n 1)S X µ 0 / n (n 1)S /(n 1) χ n 1 One-sample t test in R # we will use the same sample x of size n as in the previous example # Student s t-test for a hypothesis about the mean (with unknown s.d.) # H 0 : µ = 165 cm > mu0 <- 165 > x.bar <- mean(x) # sample mean x > s <- var(x) # sample variance s > t.score <- (x.bar - mu0) / sqrt(s / n) # t statistic > print(t.score) # positive indicates µ > µ 0, negative µ < µ 0 > -qt(0.05/, n-1) # two-sided rejection threshold for t at α =.05 > * pt(abs(t.score), n-1, lower=false) # two-sided p-value # Mini-task: plot density function of t distribution for different d.f. > t.test(x, mu=165) # agrees with our manual t-test # Note that t.test() also provides a confidence interval for the true µ! SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 9 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 30 / 33 If we do not have a specific H 0 to start from, estimate confidence interval for µ or by inverting hypothesis tests in principle same procedure as for binomial confidence intervals implemented in R for t test and chi-squared test Confidence interval has a particularly simple form for the t test Given H 0 : µ = a for some a R, we reject H 0 if x a t = s /n > C with C for α =.05 and n > 30 x C s n µ x + C s n this is the origin of the ± standard deviations rule of thumb SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 31 / 33 SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 3 / 33
9 Can you work out a similar confidence interval for? Test hypotheses H 0 : = a for different values a > 0 Which H 0 are rejected given the observed sample variance s? If H 0 is true, we have the sampling distribution Z := (n 1)S /a χ n 1 Reject H 0 if Z > C 1 or Z < C (not symmetric) Solve inequalities to obtain confidence interval (n 1)s /C 1 (n 1)s /C SIGIL (Baroni & Evert) 3b. Continuous Data: Inference sigil.r-forge.r-project.org 33 / 33
z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests
z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationHomework for 1/13 Due 1/22
Name: ID: Homework for 1/13 Due 1/22 1. [ 5-23] An irregularly shaped object of unknown area A is located in the unit square 0 x 1, 0 y 1. Consider a random point distributed uniformly over the square;
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationIntroduction to Statistical Inference
Introduction to Statistical Inference Dr. Fatima Sanchez-Cabo f.sanchezcabo@tugraz.at http://www.genome.tugraz.at Institute for Genomics and Bioinformatics, Graz University of Technology, Austria Introduction
More informationStatistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018
Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement
More informationCh. 7. One sample hypothesis tests for µ and σ
Ch. 7. One sample hypothesis tests for µ and σ Prof. Tesler Math 18 Winter 2019 Prof. Tesler Ch. 7: One sample hypoth. tests for µ, σ Math 18 / Winter 2019 1 / 23 Introduction Data Consider the SAT math
More informationStatistical Inference
Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationSolution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6]
Name: SOLUTIONS Midterm (take home version) To help you budget your time, questions are marked with *s. One * indicates a straight forward question testing foundational knowledge. Two ** indicate a more
More information1 Probability Distributions
1 Probability Distributions A probability distribution describes how the values of a random variable are distributed. For example, the collection of all possible outcomes of a sequence of coin tossing
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationNormal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):
Lecture Three Normal theory null distributions Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT): A random variable which is a sum of many
More informationSTAT 513 fa 2018 Lec 02
STAT 513 fa 2018 Lec 02 Inference about the mean and variance of a Normal population Karl B. Gregory Fall 2018 Inference about the mean and variance of a Normal population Here we consider the case in
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationAn inferential procedure to use sample data to understand a population Procedures
Hypothesis Test An inferential procedure to use sample data to understand a population Procedures Hypotheses, the alpha value, the critical region (z-scores), statistics, conclusion Two types of errors
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 6 Sampling and Sampling Distributions Ch. 6-1 6.1 Tools of Business Statistics n Descriptive statistics n Collecting, presenting, and describing data n Inferential
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8
CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval
More informationConfidence Intervals with σ unknown
STAT 141 Confidence Intervals and Hypothesis Testing 10/26/04 Today (Chapter 7): CI with σ unknown, t-distribution CI for proportions Two sample CI with σ known or unknown Hypothesis Testing, z-test Confidence
More informationProbability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!
Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive
More informationEC2001 Econometrics 1 Dr. Jose Olmo Room D309
EC2001 Econometrics 1 Dr. Jose Olmo Room D309 J.Olmo@City.ac.uk 1 Revision of Statistical Inference 1.1 Sample, observations, population A sample is a number of observations drawn from a population. Population:
More informationCh. 1: Data and Distributions
Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and
More informationPopulation Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8
Concepts from previous lectures HUMBEHV 3HB3 one-sample t-tests Week 8 Prof. Patrick Bennett sampling distributions - sampling error - standard error of the mean - degrees-of-freedom Null and alternative/research
More informationStat 139 Homework 2 Solutions, Spring 2015
Stat 139 Homework 2 Solutions, Spring 2015 Problem 1. A pharmaceutical company is surveying through 50 different targeted compounds to try to determine whether any of them may be useful in treating migraine
More informationM(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1
Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)
More informationChapter 5 Confidence Intervals
Chapter 5 Confidence Intervals Confidence Intervals about a Population Mean, σ, Known Abbas Motamedi Tennessee Tech University A point estimate: a single number, calculated from a set of data, that is
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationThe t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary
Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis
More informationChapter 5: HYPOTHESIS TESTING
MATH411: Applied Statistics Dr. YU, Chi Wai Chapter 5: HYPOTHESIS TESTING 1 WHAT IS HYPOTHESIS TESTING? As its name indicates, it is about a test of hypothesis. To be more precise, we would first translate
More information(Re)introduction to Statistics Dan Lizotte
(Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned
More information10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing
& z-test Lecture Set 11 We have a coin and are trying to determine if it is biased or unbiased What should we assume? Why? Flip coin n = 100 times E(Heads) = 50 Why? Assume we count 53 Heads... What could
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationInference for Single Proportions and Means T.Scofield
Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests
Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores
More informationExplore the data. Anja Bråthen Kristoffersen
Explore the data Anja Bråthen Kristoffersen density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by a density function, p(x)
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationAMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015
AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Lecture 6 Sampling and Sampling Distributions Ch. 6-1 Populations and Samples A Population is the set of all items or individuals of interest Examples: All
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationMath 143: Introduction to Biostatistics
Math 143: Introduction to Biostatistics R Pruim Spring 2012 0.2 Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim Contents 10 More About Random Variables 1 10.1 The Mean of a Random Variable....................................
More information16.400/453J Human Factors Engineering. Design of Experiments II
J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationIntroductory Statistics with R: Simple Inferences for continuous data
Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationUnit 9 ONE Sample Inference SOLUTIONS
BIOSTATS 540 Fall 017 Introductory Biostatistics Solutions Unit 9 ONE Sample Inference SOLUTIONS 1. This exercise gives you practice calculating a confidence interval for the mean of a Normal distribution
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationStatistics. Statistics
The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,
More informationEE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002
EE/CpE 345 Modeling and Simulation Class 0 November 8, 2002 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation
More information7.1 Basic Properties of Confidence Intervals
7.1 Basic Properties of Confidence Intervals What s Missing in a Point Just a single estimate What we need: how reliable it is Estimate? No idea how reliable this estimate is some measure of the variability
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationLecture 10: Sep 24, v2. Hypothesis Testing. Testing Frameworks. James Balamuta STAT UIUC
Lecture 10: Sep 24, 2018 - v2 Hypothesis Testing Testing Frameworks James Balamuta STAT 385 @ UIUC Announcements hw04 is due Friday, Sep 21st, 2018 at 6:00 PM Quiz 05 covers Week 4 contents @ CBTF. Window:
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationThe t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies
The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationSTAT100 Elementary Statistics and Probability
STAT100 Elementary Statistics and Probability Exam, Sample Test, Summer 014 Solution Show all work clearly and in order, and circle your final answers. Justify your answers algebraically whenever possible.
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationGov 2000: 6. Hypothesis Testing
Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationInferential Statistics
Inferential Statistics Part 1 Sampling Distributions, Point Estimates & Confidence Intervals Inferential statistics are used to draw inferences (make conclusions/judgements) about a population from a sample.
More informationSampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop
Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT), with some slides by Jacqueline Telford (Johns Hopkins University) 1 Sampling
More informationStatistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing
Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing So, What is Statistics? Theory and techniques for learning from data How to collect How to analyze How to interpret
More informationF79SM STATISTICAL METHODS
F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationName: SOLUTIONS Exam 01 (Midterm Part 2 take home, open everything)
Name: SOLUTIONS Exam 01 (Midterm Part 2 take home, open everything) To help you budget your time, questions are marked with *s. One * indicates a straightforward question testing foundational knowledge.
More informationSpace Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses
Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William
More informationEconomics 573 Problem Set 4 Fall 2002 Due: 20 September
Economics 573 Problem Set 4 Fall 2002 Due: 20 September 1. Ten students selected at random have the following "final averages" in physics and economics. Students 1 2 3 4 5 6 7 8 9 10 Physics 66 70 50 80
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationHypothesis Testing One Sample Tests
STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationSummary: the confidence interval for the mean (σ 2 known) with gaussian assumption
Summary: the confidence interval for the mean (σ known) with gaussian assumption on X Let X be a Gaussian r.v. with mean µ and variance σ. If X 1, X,..., X n is a random sample drawn from X then the confidence
More informationMath 70 Homework 8 Michael Downs. and, given n iid observations, has log-likelihood function: k i n (2) = 1 n
1. (a) The poisson distribution has density function f(k; λ) = λk e λ k! and, given n iid observations, has log-likelihood function: l(λ; k 1,..., k n ) = ln(λ) k i nλ ln(k i!) (1) and score equation:
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 7 Estimates and Sample Sizes 7-1 Overview 7-2 Estimating a Population Proportion 7-3
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationEE/CpE 345. Modeling and Simulation. Fall Class 9
EE/CpE 345 Modeling and Simulation Class 9 208 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation - the behavior
More informationThe t-statistic. Student s t Test
The t-statistic 1 Student s t Test When the population standard deviation is not known, you cannot use a z score hypothesis test Use Student s t test instead Student s t, or t test is, conceptually, very
More informationConfidence Intervals and Hypothesis Tests
Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.
More informationTopic 15: Simple Hypotheses
Topic 15: November 10, 2009 In the simplest set-up for a statistical hypothesis, we consider two values θ 0, θ 1 in the parameter space. We write the test as H 0 : θ = θ 0 versus H 1 : θ = θ 1. H 0 is
More informationHypothesis Testing. Gordon Erlebacher. Thursday, February 14, 13
Hypothesis Testing Gordon Erlebacher What we have done R basics: - vectors, data frames, - factors, extraction, - logical expressions, scripts, read and writing data files - histograms, plotting Functions
More informationi=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that
Math 47 Homework Assignment 4 Problem 411 Let X 1, X,, X n, X n+1 be a random sample of size n + 1, n > 1, from a distribution that is N(µ, σ ) Let X = n i=1 X i/n and S = n i=1 (X i X) /(n 1) Find the
More informationexp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1
4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationThe Chi-Square and F Distributions
Department of Psychology and Human Development Vanderbilt University Introductory Distribution Theory 1 Introduction 2 Some Basic Properties Basic Chi-Square Distribution Calculations in R Convergence
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationMA20226: STATISTICS 2A 2011/12 Assessed Coursework Sheet One. Set: Lecture 10 12:15-13:05, Thursday 3rd November, 2011 EB1.1
MA20226: STATISTICS 2A 2011/12 Assessed Coursework Sheet One Preamble Set: Lecture 10 12:15-13:05, Thursday 3rd November, 2011 EB1.1 Due: Please hand the work in to the coursework drop-box on level 1 of
More information