Today s Outline Biostatistics 602 - Statistical Inference Lecture 01 Introduction to Principles of Hyun Min Kang Course Overview of January 10th, 2013 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 1 / 39 Basic Polls : Home Department Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 2 / 39 Basic Polls : Official Roster What is your home department? Biostatistics Statistics Bioinformatics Survey Methodology Other Departments Are you taking the class, or just sitting in? Taking for credit Sitting in Plan to take, but needs permission Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 3 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 4 / 39
Basic Polls : 601 History - Course Information Have you taken BIOSTAT601 or equivalent class? I took BIOSTAT601 I took an BIOSTAT601-equivalent class I do not have BIOSTAT601 equivalent background Instructor Name Hyun Min Kang Office M4531, SPH II E-mail hmkang@umichedu Office hours Thursday 4:30-5:30pm Course Web Page See http://genomesphumichedu/wiki/602 No C-Tools site will be available in 2013 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 5 / 39 - Basic Information Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 6 / 39 - Textbooks Class Time and Location Time Tuesday and Thursday 1:00-3:00pm Location USB 2260 Prerequisites BIOSTAT601 or equivalent knowledge Chapter 1-55 of Casella and Berger Basic calculus and matrix algebra Required Textbook Ṣtatistical Inference, 2nd Edition, by Casella and Berger Recommended Textbooks Statistical Inference, by Garthwaite, Jolliffe and Jones All of Statistics: A Concise Course in Statistical Inference, by Wasserman Mathematical Statistics: Basics Ideas and Selected Topics, by Bickel and Doksum Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 7 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 8 / 39
Grading Important Dates Homework 20% Midterm 40% Final 40% First Lecture : Thursday January 10th, 2013 Midterm : 1:00pm - 3:00pm, Thursday February 21st, 2013 No lectures on March 5th and 7th Vacation No lecture on April 2nd Instructor out of town Last Lecture : Tuesday April 23rd, 2013 Total of 26 lectures Final : 4:00pm - 6:00pm, Thursday April 25th, 2013 University-wide schedule Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 9 / 39 Honor code Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 10 / 39 About the style of the class Honor code is STRONGLY enforced throughout the course The key principle is that all your homework and exams must be on your own See http://wwwsphumichedu/academics/policies/conducthtml for details You are encouraged to discuss the homework with your colleagues You are NOT allowed to share any piece of your homework with your colleagues electronically or by a hard copy If a break of honor code is identified, your entire homework or exam will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit In previous years, the instructors wrote the notes on the whiteboard or projected the notes onto a screen during the class In this class, we will use prepared slides for the sake of clarity For this reason, the his class has a risk to serve as a slot for after-lunch nap Instructor strongly encourages to copy the slides during the class by hand to digest the material, although all slides will be available online Focusing on the class will be helpful a lot Feedback on the class, especially on the lecture style, would be very much appreciated Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39
Statistical Inference Notations in Probability in BIOSTAT601 Given some specified probability mass function pmf or probability density function pdf, we can make probabilistic statement about data that could be generated from the model Statistical Inference in A process of drawing conclusions or making statements about a population of data based on a random sample of data from the population X 1,, X n : Random variables identically and independently distributed iid with probability density or mass function f X x θ x 1,, x n : Realization of random variables X 1,, X n X X 1,, X n is a random sample of a population typically iid, and the characteristics of this population are described by f X x θ The joint pdf or pmf of X X 1,, X n assuming iid is f X x θ n f X x i θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 13 / 39 BIOSTAT601 vs Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39 Example of BIOSTAT601 Questions BIOSTAT601 In BIOSTAT601, we assume the knowledge of θ in making probabilistic statements about X 1,, X n In, we do not know the true value of the parameter θ, and instead we try to learn about this true parameter value through the observed data x 1,, x n For a sample size n, let X 1,, X n iid Bernoullip 0 What is the probability of n X i m? X i Binomialn, p 0 Pr X i m m k0 n p k k 01 p 0 n k Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 15 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 16 / 39
Example of Questions : Examples of informal questions We assume that the data was generated by a pdf or pmf that belongs to a class of pdfs or pmfs P {f X x θ, θ Ω R p } For example X Bernoulliθ, θ 0, 1 Ω R We collect data in order to 1 Estimate θ point estimation 2 Perform tests of hypothesis about θ 3 Estimate confidence intervals for θ interval estimation 4 Make predictions of future data 1 Estimate θ point estimation What is the estimated probability of head given a series of coin tosses? 2 Perform tests of hypothesis about θ Given a series of coin tosses, can you tell whether the coin is biased or not? 3 Estimate confidence intervals for θ interval estimation What is the plausible range of the true probability of head, given a series of coin tosses? 4 Make predictions of future data Given the series of coin tosses, can you predict what the outcome of the next coin toss? Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 17 / 39 Statistic Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39 Data x 1,, x n : Realization of random variables X 1,, X n Define a function of data We wish this summary of data to Tx 1,, x n : R n R d 1 Be simpler than the original data, eg d n 2 Keep all the information about θ that is contained in the original data x 1,, x n TX 1,, X n TX It is a function of random variables X 1,, X n TX itself is also a random variable TX defines a form of data reduction or data summary Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 19 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 20 / 39
Example of Data reduction in terms of a statistic TX is a partition of the sample space X Example iid Suppose X i Bernoullip for i 1, 2, 3, and 0 < p < 1 Define TX 1, X 2, X 3 X 1 + X 2 + X 3, then T : {0, 1} 3 {0, 1, 2, 3} Partition X 1 X 2 X 3 TX X 1 + X 2 + X 3 A 0 0 0 0 0 A 1 0 1 0 1 0 0 1 1 1 0 0 1 A 2 1 0 1 2 0 1 1 2 1 1 0 2 A 3 1 1 1 3 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 21 / 39 Example of T {t : t TX for some x X } A t {x : TX t, t T } Instead of reporting x x 1, x 2, x 3 T, we report only TX t, or equivalently x A t Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 22 / 39 The partition of the sample space based on TX is coarser than the original sample space There are 8 elements in the sample space X They are partitioned into 4 subsets Thus, TX is simpler or coarser than X Definition 621 A statistic TX is a sufficient statistic for θ if the conditional distribution of sample X given the value of TX does not depend on θ In other words, the conditional pdf or pmf of X given T t, f X x TX t hx does not depend on θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 23 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 24 / 39
: Example Proof : Overview Suppose X 1,, X n iid Bernoullip, 0 < p < 1 Claim that TX 1,, X n n X i is a sufficient statistic for p TX n X i Binomialn, p Need to find the conditional pmf of X given T t And show that the distribution does not depend on p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 25 / 39 Detailed Proof Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 26 / 39 Detailed Proof cont d If n X i t, t Binomialn, p Pr X 1 x 1,, X n x n X i t Pr X 1 x 1,, X n x n, n X i t Pr n X i t Pr X 1 x 1,, X n x n Pr n X if n i t X i t 0 otherwise PrX 1 x 1,, X n x n Pr Pr X i t X x X i t n PrX i x i p x 1 1 p 1 x1 p x n 1 p 1 x n p n x i 1 p n n x i n t 1 n t p t 1 p n t Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 27 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39
Detailed Proof cont d Note from the proof Therefore, conditional distribution Pr X x X i t { 1 n if n t X i t 0 otherwise If X is a sample point such that TX t, then PrX x Tx t 0 always, so we don t have to consider the case when Tx t in the future Because PrX TX t does not depend on p, by definition, TX n X i is a sufficient statistic for p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 29 / 39 A Theorem for Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 30 / 39 Proof of Theorem 622 - discrete case Theorem 622 Let f X x θ is a joint pdf or pmf of X and qt θ is the pdf or pmf of TX Then TX is a sufficient statistic for θ, if, for every x X, the ratio f X x θ/qtx θ is constant as a function of θ Pr X x TX t Pr X x, TX t PrTX t PrX x if Tx t PrTX t 0 otherwise f X x θ if Tx t qtx θ 0 otherwise which does not depend on θ by assumption Therefore, TX is a sufficient statistic for θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 31 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39
Example 623 - Binomial Sufficient Statistic Example 623 - Binomial Sufficient Statistic Proof Problem iid X 1,, X n Bernoullip, 0 < θ < 1 Show that TX n X i is a sufficient statistic for θ This is the same problem from the last lecture, but we would like to solve is using Theorem 622 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 33 / 39 Example 624 - Normal Sufficient Statistic Problem iid X 1,, X n N µ, σ 2 Assume that σ 2 is known Show that the sample mean TX X 1 n n X i is a sufficient statistic for µ f X x p p x 1 1 p 1 x1 p x n 1 p 1 x n p n x i 1 p n n x i TX Binomialn, p n qt p p t 1 p n t t f X x p qtx p n n x i 1 n n x i p n x i 1 p n n x i p n x i1 p n n x i 1 n Tx By theorem 622 TX is a sufficient statistic for p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39 Example 624 - Proof f X x µ n 1 f X x µ exp x i µ 2 2πσ 2 2σ 2 2πσ 2 n/2 x i µ 2 exp 2σ 2 2πσ 2 n/2 x i x + x µ 2 exp 2πσ 2 n/2 exp 2σ 2 n x i x 2 + nx µ 2 2σ 2 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 35 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39
Example 624 - Proof cont d Example 624 - Proof qtx µ Remember from BIOSTAT601 that TX X N µ, σ 2 /n qtx µ 1 2πσ 2 /n exp nx µ 2 /2σ 2 Putting things together f X x µ qtx µ n 2πσ 2 n/2 exp x i x 2 + nx µ 2 2σ 2 2πσ 2 /n 1/2 nx µ2 exp 2σ 2 n n 1/2 2πσ 2 n 1/2 exp x i x 2 2σ 2 which does not depend on µ By Theorem 622, the sample mean is a sufficient statistic for µ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 37 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 38 / 39 Today Overview of Key differences between BIOSTAT601 and X is conditionally independent on θ given TX If ratio of pdfs between X and TX does not depend on θ, TX is a sufficient statistic Next Lecture More on Factorization Theorem Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 39 / 39