Hypothesis testing and the Gamma function

Similar documents
This is the end. Kenneth A. Ribet. Math 10A December 1, 2016

z-test, t-test Kenneth A. Ribet Math 10A November 28, 2017

The remains of the course

Methods of Mathematics

Probability concepts. Math 10A. October 33, 2017

Intro to probability concepts

Course Review. Kin 304W Week 14: April 9, 2013

Substitution and change of variables Integration by parts

More on infinite series Antiderivatives and area

Probability. Kenneth A. Ribet. Math 10A After Election Day, 2016

Methods of Mathematics

Project Instructions Please Read Thoroughly!

Differential Equations

Announcements Monday, September 18

Methods of Mathematics

Physics 8 Monday, September 9, 2013

Techniques of Integration: I

Inference for Proportions, Variance and Standard Deviation

Calculus (Math 1A) Lecture 1

Math 147 Lecture Notes: Lecture 12

Math 447. Introduction to Probability and Statistics I. Fall 1998.

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

June If you want, you may scan your assignment and convert it to a.pdf file and it to me.

CS173 Lecture B, November 3, 2015

Wednesday, 10 September 2008

Outline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Week 12: Optimisation and Course Review.

Physics 121 for Majors

General Chemistry I: Structure

Syllabus. Physics 0847, How Things Work Section II Fall 2014

Section Instructors: by now you should be scheduled into one of the following Sections:

Chapter 23. Inference About Means

Math 250B Midterm I Information Fall 2018

Math 1190/01 Calculus I Fall 2007

Derivatives and series FTW!

MAT01A1. Numbers, Inequalities and Absolute Values. (Appendix A)

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

CHEM 115 Lewis Structures Model

PHY131H1F - Class 20. Today: Today, Chapter 13: General Periodic Oscillations. Special case: Simple Harmonic Motion

Derivatives: definition and first properties

Welcome to Physics 211! General Physics I

Math 1313 Experiments, Events and Sample Spaces

Introductory Probability

Physics 8 Friday, September 6, 2013

STA2601. Tutorial Letter 104/1/2014. Applied Statistics II. Semester 1. Department of Statistics STA2601/104/1/2014 TRIAL EXAMINATION PAPER

ASTR1120L & 2030L Introduction to Astronomical Observations Spring 2019

CBA4 is live in practice mode this week exam mode from Saturday!

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

AS 101: The Solar System (Spring 2017) Course Syllabus

Thank you for choosing AIMS!

Physics Fall Semester. Sections 1 5. Please find a seat. Keep all walkways free for safety reasons and to comply with the fire code.

Required Textbook. Grade Determined by

Physics 8 Wednesday, November 18, 2015

Hypothesis Testing in Action: t-tests

Dr. Soeren Prell A417 Zaffarano Hall Office Hours: by appointment (just send me a brief )

Math221: HW# 7 solutions

Teaching Linear Algebra, Analytic Geometry and Basic Vector Calculus with Mathematica at Riga Technical University

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

MAT01A1. Numbers, Inequalities and Absolute Values. (Appendix A)

Physics 273 (Fall 2013) (4 Credit Hours) Fundamentals of Physics II

Ph 1a Fall General Information

Welcome to Math 10A!!


Mathematical Induction

T has many other desirable properties, and we will return to this example

Welcome to. Session

MAT01A1. Appendix E: Sigma Notation

Daily Update. Dondi Ellis. January 27, 2015

AS The Astronomical Universe. Prof. Merav Opher - Fall 2013

Bell-shaped curves, variance

How to write maths (well)

Welcome to Physics 161 Elements of Physics Fall 2018, Sept 4. Wim Kloet

Important Dates. Non-instructional days. No classes. College offices closed.

PHYS 202. Lecture 24 Professor Stephen Thornton April 27, 2005

Lab Slide Rules and Log Scales

Lecture 1/25 Chapter 2

Preparing for the CS 173 (A) Fall 2018 Midterm 1

Statistics 100 Exam 2 March 8, 2017

CHE 717, Chemical Reaction Engineering, Fall 2016 COURSE SYLLABUS

Life Science 7 Mrs. Duddles. Q1 Rock Cycle, Soil & Ecosystems

review session gov 2000 gov 2000 () review session 1 / 38

AP Calculus AB Summer Assignment


ASTR1120L & 2030L Introduction to Astronomical Observations Fall 2018

Common ontinuous random variables

, p 1 < p 2 < < p l primes.

Stat 20 Midterm 1 Review

Math 440 Project Assignment

Topics in General Chemistry Chemistry 103 Fall 2017

Physics 218 Lecture 13

Nirma University Institute of Technology

Who SHOULD take this course? Course Goals. Beginning of Today s Class. Who am I. Course Goals (more general) 1/16/18

Ch. 2: Lec. 1. Basics: Outline. Importance. Usages. Key problems. Three ways of looking... Colbert on Equations. References. Ch. 2: Lec. 1.

Course: Math 111 Pre-Calculus Summer 2016

Chemistry Syllabus Fall Term 2017

28 (Late Start) 7.2a Substitution. 7.1b Graphing with technology Feb 2. 4 (Late Start) Applications/ Choosing a method

1.1 Administrative Stuff

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Transcription:

Math 10A November 29, 2016

Announcements Please send me email to sign up for the next two (last two?) breakfasts: This Thursday (December 1) at 9AM. Next Monday (December 5), also at 9AM. Pop-in lunch on Friday, December 2 at 12:10PM (no sign-ups needed, just show up) Coffee at Blue Bottle Coffee Company (University Avenue near Li Ka Shing) on Friday, December 2 at 9AM.

Today we will discuss the Z -test (through examples), meet the Gamma function and (probably) start our discussion of Student s t-testing. The Z -test business is sort of a review of last week s class. The t-subject is the last topic in the Math 10A syllabus. On Thursday, we will discuss t distributions and testing, maybe take a class photo and then have a course evaluation session where I leave and the students fill out their course evaluations online while they re still in the room.

This comes from the math department: If your class meets on Tuesdays and Thursdays you could mention on Tuesday that you will be devoting the last 15 minutes of Thursday s class to the online course evaluations, and you could ask the students to bring their laptop computers to Thursdays class.... It would be a good idea for you to leave the classroom for those last 15 minutes while students are completing their evaluations of your teaching. Students can access the evaluation system through the following link: https://course-evaluations.berkeley.edu. As of 1PM on Tuesday, about 100 students had completed their Math 10A evaluations.

Last Thursday s pop-in lunch (non-vegetarian version)

Z -test example Wikipedia (rephrased): Suppose that in a particular geographic region, the mean and standard deviation of reading test scores are 100 points, and 12 points, respectively.... Fifty-five students in School X... received a mean score of 96.... are the students in this school comparable to a random sample of 55 students from the region as a whole, or are their scores surprisingly low?

The scores of random generic students are given as values of a random variable X with mean µ = 100 and standard deviation σ = 12. If we have N independent copies X i of X (with N = 55, for example), then X = X 1 + + X N N is (approximately) normally distributed with mean µ and standard deviation σ. N As we all saw last Tuesday, Z = (X µ) N σ is then normal(ish) with mean 0 and standard deviation 1. An essentially correct statement is that Z is a standard normal variable in the limit (i.e., for N ).

When we evaluate this random variable on our sample of students, X = 96 (the mean score of the 55 students in question) and N = 55. Thus Z = (96 100) 55 12 2.472. This value occurs far out from the mean (which is 0). The probability of getting a Z that s 2.472 or less can be bounded by using the table on page 566 of Schreiber. Namely, 49.18% of the scores occur between 0 and 2.4; thus fewer than 1% of the scores are larger than 2.4. Symmetrically, fewer than 1% of the scores are smaller than 2.4. Thus the probability of getting 2.472 or a lower score is way less than 1%. (Wikipedia says it s about 0.0068.)

If you prefer two-sided statements, you can say that fewer than 2% of students in a random sample will get adjusted scores that are either greater than 2.472 or less than 2.472. In any case, we re discussing p-values (probabilities) that are less than 5%.

Conclusion? We reject the null hypothesis that the 55 students from the particular school were drawn from a random sample of students in the entire region. Other ways to communicate this information: The students from this school have surprisingly low scores. There is statistical significance to the low scores of the students.

Second example: Example 5.2 from Prob-Stat online notes We see a man flipping a coin and he gets 65 heads out of 100 tosses. If p is the probability of heads, we wish to test whether p 0.5. Why is the coin flipper a man? This example is like that in the HW where 100 tosses led to 70 heads. In that situation, we rejected the null hypothesis that the coin is fair. This is again the null hypothesis, but 65 is less than 70, so the situation is less clear-cut.

If we plug in the data, we get for the numerical value (0.65 0.5) 10 1/2 (X µ) N σ = 0.15 20 = 3. Being three standard deviations away from the mean is an extremely low-probability event. (It s worse than being 2.4 standard deviations away from the mean.) Specifically, being three standard deviations above the mean occurs with probability 1 0.9987 (according to page 566 of the book). This number is about 0.13%. Being at least three standard deviations away from the mean (i.e., three above or three below) occurs with double the probability. The number involved, about.26%, is still minuscule. We reject the null hypothesis. We think that the coin is not fair.

A Penn State example Boys of a certain age have a mean weight of 85 pounds. A complaint is made that the boys living in a municipal children s home are underfed. Twenty-five boys (of this age) are weighed and found to have a mean weight of 80.94 pounds. The population standard deviation is 11.6 pounds.... what should be concluded concerning the complaint? The null hypothesis is that the mean weight of children in this 80.94 85 home is 85 pounds. The statistic Z is 25 = 1.75. 11.6 For a normal distribution with mean 0 and standard deviation 1, the probability of being 1.75 is 0.5 0.4599, according to table 7.3 on page 566 of Schreiber. This number is around 0.04, which is less than 0.05. Hence we reject the null hypothesis and conclude that the complaint is valid.

In quite a few situations involving statistics, one encounters values of the Gamma function Γ(s). Here is a quick tutorial on Γ. Definition: for real numbers s > 0, For example, Γ(s) = 0 t s t dt e t = 0 t s 1 e t dt. Γ(1) = 0 e t dt = e t ] 0 = 1.

Integration by parts: Γ(s) = 0 t s 1 e t dt = t s 1 e t ] 0 = 0 + (s 1)Γ(s 1). + 0 (s 1)t s 2 e t dt Thus, for example: Γ(2) = 1 Γ(1), Γ(3) = 2 Γ(2), Γ(4) = 3 Γ(3),.... Since Γ(1) = 1, we get Γ(2) = 1, Γ(3) = 2, Γ(4) = 6, etc. In general, we see recursively that Γ(n) = (n 1)!. (In 10B, one would prove this by induction.) Another way to write the formula: Γ(s + 1) = sγ(s).

There is a well known duplication formula for the Gamma function that is usually ascribed to Gauss. According to Wikipedia, it is also known as the Legendre duplication formula: Γ(s)Γ(s + 1 2 ) = 21 2s π Γ(2s). Take s = 1 2 : Because Γ(1) = 1, we get Γ( 1 2 )Γ(1) = 20 π Γ(1). Γ( 1 2 ) = π. This is good to know because the π in the formula for the standard Gaussian is often written Γ(1/2) in the literature.

For another example, we can calculate Γ(3/2) in two ways. By the duplication formula, Γ(3/2)Γ(2) = 2 1 3 π Γ(3), so Γ(3/2) = 1 1 1 π π 2 = 4 2. We can also write Γ(3/2) = Γ( 1 2 + 1) = 1 2 Γ(1 2 ) = π 2.

In the Math 10B notes, there s a critical moment where you read:... Recall from last semester what you learned about the Gamma function.... Last semester, I was just barely able to get some of my students to admit that they had been exposed to the Gamma function in 10A. I m hoping that you all will nod enthusiastically next semester when the Gamma function arises. Spoiler: you ll see Gamma values in the formula for the PDF of Chi Squared distributions, whatever they are. (You ll learn about them in 10B.)

In the Math 10B notes, there s a critical moment where you read:... Recall from last semester what you learned about the Gamma function.... Last semester, I was just barely able to get some of my students to admit that they had been exposed to the Gamma function in 10A. I m hoping that you all will nod enthusiastically next semester when the Gamma function arises. Spoiler: you ll see Gamma values in the formula for the PDF of Chi Squared distributions, whatever they are. (You ll learn about them in 10B.)

Student Student was William Sealy Gosset, 1876 1937. He was the inventor of Student s t-test, which we will now explore. The first thing to understand is that there are multiple distributions named after Student; one speaks of Student s t-distribution," but there is a parameter in the distribution, the number of degrees of freedom. This number is ν ( nu ). Because ν can vary (ν = 1, 2, 3...), there is really a family of distributions. The number ν will be one less than the number of samples (N or n, depending on notation) that one draws.

Wikipedia: Whereas a normal distribution describes a full population, t-distributions describe samples drawn from a full population; accordingly, the t-distribution for each sample size is different, and the larger the sample, the more the distribution resembles a normal distribution. What is t? It s a statistic like Z that s defined by a formula that s nearly a clone of the formula Z = (X µ) N. σ The main difference is that t is applied in situations where µ is known but σ is unknown and is replaced by an estimate. Using the estimate in place of the (unkown) true value changes the distribution....

On Thursday we will see how the t-distributions are used for hypothesis testing. Today we are just trying to figure out what the distributions are.

The νth Student PDF is given by C (1 + x 2 ν ) (ν+1)/2, where C is a constant chosen so that the total area under the curve is 1 (as it s supposed to be). If you graph y = (1 + x 2 ν ) (ν+1)/2 for different values of ν, you ll see a function that s symmetrical about the y-axis (because of the x 2 ) and that looks qualitatively like the PDF of the standard normal. As ν, you can check (and please do check!) that (1 + x 2 ν ) (ν+1)/2 e x 2 /2.

To perform the check, you need to know the formula ( lim 1 + a ) n = e a n n when a is a real number. This can be verified (for example) by taking logs and using l Hôpital s Rule.

Graphs of (1 + x 2 ν ) (ν+1)/2 for ν = 1, 2, 3: 1 0.8 0.6 0.4 0.2-3 -2-1 0 1 2 3 Color code: ν = 1, magenta; ν = 2, gold; ν = 3, blue.

The constant C We need to take / C = 1 (1 + x 2 ν ) (ν+1)/2 dx. The easiest case is that where ν = 1. Because the 1 antiderivative of 1 + x 2 is tan 1 (x), dx = π, so C = 1/π. 1 + x 2

In general, This work when ν = 1 because C = 1 Γ( ν+1 2 ) νπ Γ( ν 2 ). Γ( ν + 1 2 ) = Γ(1) = 1, Γ(ν 2 ) = Γ(1 2 ) = π. According to Wikipedia, C involves π when ν is odd but not when ν is even.

About the final It s cumulative but probably will have more on the last third of the course than on the second third, and more on the second third than on the first third. That s only because what s fresh in one s mind usually gets extra attention. The final exam is on Friday night, 7 10PM, exam group 20. However, we don t yet know where it is. Do not show up in 2050 VLSB without first checking the final exam schedule!!!!