Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Similar documents
Last Lecture - Key Questions. Biostatistics Statistical Inference Lecture 03. Minimal Sufficient Statistics

Stat 609: Mathematical Statistics I (Fall Semester, 2016) Introduction

Summary. Complete Statistics ... Last Lecture. .1 What is a complete statistic? .2 Why it is called as complete statistic?

Machine Learning. Instructor: Pranjal Awasthi

MATH 450: Mathematical statistics

CIS 2033 Lecture 5, Fall

MAT 271E Probability and Statistics

Summary. Ancillary Statistics What is an ancillary statistic for θ? .2 Can an ancillary statistic be a sufficient statistic?

MAT 271E Probability and Statistics

CS446: Machine Learning Spring Problem Set 4

Classical and Bayesian inference

Mathematical Description of Light

AS102 -The Astronomical Universe. The boring details. AS102 - Major Topics. Day Labs - Rooms B4 & 606. Where are we (earth, sun) in the universe?

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Part 3: Parametric Models

AE 200 Engineering Analysis and Control of Aerospace Systems

PHYSICS 206, Spring 2019

MAT 271E Probability and Statistics

Physics 9, Introductory Physics II Spring 2010

Welcome to Physics 161 Elements of Physics Fall 2018, Sept 4. Wim Kloet

Machine Learning CSE546

Introductory Probability

Lecture 1: Introduction and probability review

STAT 305 Introduction to Statistical Inference

Who should take this course? Required Text. Course Information. How to succeed in this course

PHYSICS 100. Introduction to Physics. Bridges the gap between school science and Physics 101, Physics 120, Physics 125 or Physics 140

CE261 ENGINEERING MECHANICS - DYNAMICS

CHEM 4725/8725 Organometallic Chemistry. Spring 2016

Astronomy 001 Online SP16 Syllabus (Section 8187)

Physics 351 Wednesday, January 14, 2015

CS483 Design and Analysis of Algorithms

Homework 2. Spring 2019 (Due Thursday February 7)

School of Mathematics and Science MATH 163 College Algebra Section: EFA

MAE/MSE502 Partial Differential Equations in Engineering. Spring 2019 Mon/Wed 6:00-7:15 PM Classroom: CAVC 101

CS173 Lecture B, November 3, 2015

Page 1 of 5 Printed: 2/4/09

Who should take this course? How to succeed in this course. Course Information

Math 200 A and B: Linear Algebra Spring Term 2007 Course Description

Ch. 2: Lec. 1. Basics: Outline. Importance. Usages. Key problems. Three ways of looking... Colbert on Equations. References. Ch. 2: Lec. 1.

Probability with Engineering Applications ECE 313 Section C Lecture 2. Lav R. Varshney 30 August 2017

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Dr. LeGrande M. Slaughter Chemistry Building Rm. 307E Office phone: ; Tues, Thurs 11:00 am-12:20 pm, CHEM 331D

Math Linear Algebra Spring Term 2014 Course Description

Mathematical statistics

Methods of Mathematics

PH 610/710-2A: Advanced Classical Mechanics I. Fall Semester 2007

Physics 321 Theoretical Mechanics I. University of Arizona Fall 2004 Prof. Erich W. Varnes

Recitation 2: Probability

Physics 351 Wednesday, January 10, 2018

SYLLABUS SEFS 540 / ESRM 490 B Optimization Techniques for Natural Resources Spring 2017

AS 102 The Astronomical Universe (Spring 2010) Lectures: TR 11:00 am 12:30 pm, CAS Room 316 Course web page:

MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics

ECE531: Principles of Detection and Estimation Course Introduction

1 Probability Model. 1.1 Types of models to be discussed in the course

Chemistry Physical Chemistry I Fall 2017

Review for Exam Spring 2018

Chemistry 503 : Organometallics (3 credits) Spring 2016

Bandits, Experts, and Games

Announcements Monday, September 18

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Methods of Mathematics

Announcements Monday, November 13

General Chemistry I (CHE 1401)

Math 3B: Lecture 1. Noah White. September 23, 2016

CHEMISTRY 3A INTRODUCTION TO CHEMISTRY SPRING

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Probability Theory for Machine Learning. Chris Cremer September 2015

STATISTICAL AND THERMAL PHYSICS

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

HEAT AND THERMODYNAMICS PHY 522 Fall, 2010

Don t Trust Atoms, they Make Up Everything High School Chemistry

GREAT IDEAS IN PHYSICS

Math , Fall 2014 TuTh 12:30pm - 1:45pm MTH 0303 Dr. M. Machedon. Office: Math Office Hour: Tuesdays and

CHEMISTRY 100 : CHEMISTRY and MAN

The Central Limit Theorem

Chemistry 14C: Structure of Organic Molecules - Winter 2017 Version 56

CSC321 Lecture 18: Learning Probabilistic Models

CHEMISTRY 101 DETAILED WEEKLY TEXTBOOK HOMEWORK & READING SCHEDULE*

Data Mining Techniques. Lecture 3: Probability

Review. December 4 th, Review

Differential Equations MTH 205

Statistics Introduction to Probability

Business Statistics Midterm Exam Fall 2015 Russell. Please sign here to acknowledge

ECE Digital Signal Processing Spring Semester 2010 Classroom: 219 Riggs Hall Meeting Times: T,TH 3:30-4:45 pm

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Bayesian Methods: Naïve Bayes

Physics 18, Introductory Physics I for Biological Sciences Spring 2010

ASTRONOMY 2212 The Solar System: Planets, small bodies and new worlds Fall 2017

Stellar Astronomy 1401 Spring 2009

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Probability Theory and Simulation Methods

Chemistry 883 Computational Quantum Chemistry

Special Topic: Organic Chemistry I (SCI )

Statistics for Applications. Chapter 1: Introduction 1/43

Math 447. Introduction to Probability and Statistics I. Fall 1998.

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Welcome. to Physics 2135.

MAE502/MSE502 Partial Differential Equations in Engineering. Spring 2012 Mon/Wed 5:00-6:15 PM

The remains of the course

ASTR 4 Solar System Astronom y

Transcription:

Today s Outline Biostatistics 602 - Statistical Inference Lecture 01 Introduction to Principles of Hyun Min Kang Course Overview of January 10th, 2013 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 1 / 39 Basic Polls : Home Department Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 2 / 39 Basic Polls : Official Roster What is your home department? Biostatistics Statistics Bioinformatics Survey Methodology Other Departments Are you taking the class, or just sitting in? Taking for credit Sitting in Plan to take, but needs permission Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 3 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 4 / 39

Basic Polls : 601 History - Course Information Have you taken BIOSTAT601 or equivalent class? I took BIOSTAT601 I took an BIOSTAT601-equivalent class I do not have BIOSTAT601 equivalent background Instructor Name Hyun Min Kang Office M4531, SPH II E-mail hmkang@umichedu Office hours Thursday 4:30-5:30pm Course Web Page See http://genomesphumichedu/wiki/602 No C-Tools site will be available in 2013 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 5 / 39 - Basic Information Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 6 / 39 - Textbooks Class Time and Location Time Tuesday and Thursday 1:00-3:00pm Location USB 2260 Prerequisites BIOSTAT601 or equivalent knowledge Chapter 1-55 of Casella and Berger Basic calculus and matrix algebra Required Textbook Ṣtatistical Inference, 2nd Edition, by Casella and Berger Recommended Textbooks Statistical Inference, by Garthwaite, Jolliffe and Jones All of Statistics: A Concise Course in Statistical Inference, by Wasserman Mathematical Statistics: Basics Ideas and Selected Topics, by Bickel and Doksum Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 7 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 8 / 39

Grading Important Dates Homework 20% Midterm 40% Final 40% First Lecture : Thursday January 10th, 2013 Midterm : 1:00pm - 3:00pm, Thursday February 21st, 2013 No lectures on March 5th and 7th Vacation No lecture on April 2nd Instructor out of town Last Lecture : Tuesday April 23rd, 2013 Total of 26 lectures Final : 4:00pm - 6:00pm, Thursday April 25th, 2013 University-wide schedule Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 9 / 39 Honor code Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 10 / 39 About the style of the class Honor code is STRONGLY enforced throughout the course The key principle is that all your homework and exams must be on your own See http://wwwsphumichedu/academics/policies/conducthtml for details You are encouraged to discuss the homework with your colleagues You are NOT allowed to share any piece of your homework with your colleagues electronically or by a hard copy If a break of honor code is identified, your entire homework or exam will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit In previous years, the instructors wrote the notes on the whiteboard or projected the notes onto a screen during the class In this class, we will use prepared slides for the sake of clarity For this reason, the his class has a risk to serve as a slot for after-lunch nap Instructor strongly encourages to copy the slides during the class by hand to digest the material, although all slides will be available online Focusing on the class will be helpful a lot Feedback on the class, especially on the lecture style, would be very much appreciated Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

Statistical Inference Notations in Probability in BIOSTAT601 Given some specified probability mass function pmf or probability density function pdf, we can make probabilistic statement about data that could be generated from the model Statistical Inference in A process of drawing conclusions or making statements about a population of data based on a random sample of data from the population X 1,, X n : Random variables identically and independently distributed iid with probability density or mass function f X x θ x 1,, x n : Realization of random variables X 1,, X n X X 1,, X n is a random sample of a population typically iid, and the characteristics of this population are described by f X x θ The joint pdf or pmf of X X 1,, X n assuming iid is f X x θ n f X x i θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 13 / 39 BIOSTAT601 vs Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39 Example of BIOSTAT601 Questions BIOSTAT601 In BIOSTAT601, we assume the knowledge of θ in making probabilistic statements about X 1,, X n In, we do not know the true value of the parameter θ, and instead we try to learn about this true parameter value through the observed data x 1,, x n For a sample size n, let X 1,, X n iid Bernoullip 0 What is the probability of n X i m? X i Binomialn, p 0 Pr X i m m k0 n p k k 01 p 0 n k Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 15 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 16 / 39

Example of Questions : Examples of informal questions We assume that the data was generated by a pdf or pmf that belongs to a class of pdfs or pmfs P {f X x θ, θ Ω R p } For example X Bernoulliθ, θ 0, 1 Ω R We collect data in order to 1 Estimate θ point estimation 2 Perform tests of hypothesis about θ 3 Estimate confidence intervals for θ interval estimation 4 Make predictions of future data 1 Estimate θ point estimation What is the estimated probability of head given a series of coin tosses? 2 Perform tests of hypothesis about θ Given a series of coin tosses, can you tell whether the coin is biased or not? 3 Estimate confidence intervals for θ interval estimation What is the plausible range of the true probability of head, given a series of coin tosses? 4 Make predictions of future data Given the series of coin tosses, can you predict what the outcome of the next coin toss? Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 17 / 39 Statistic Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39 Data x 1,, x n : Realization of random variables X 1,, X n Define a function of data We wish this summary of data to Tx 1,, x n : R n R d 1 Be simpler than the original data, eg d n 2 Keep all the information about θ that is contained in the original data x 1,, x n TX 1,, X n TX It is a function of random variables X 1,, X n TX itself is also a random variable TX defines a form of data reduction or data summary Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 19 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 20 / 39

Example of Data reduction in terms of a statistic TX is a partition of the sample space X Example iid Suppose X i Bernoullip for i 1, 2, 3, and 0 < p < 1 Define TX 1, X 2, X 3 X 1 + X 2 + X 3, then T : {0, 1} 3 {0, 1, 2, 3} Partition X 1 X 2 X 3 TX X 1 + X 2 + X 3 A 0 0 0 0 0 A 1 0 1 0 1 0 0 1 1 1 0 0 1 A 2 1 0 1 2 0 1 1 2 1 1 0 2 A 3 1 1 1 3 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 21 / 39 Example of T {t : t TX for some x X } A t {x : TX t, t T } Instead of reporting x x 1, x 2, x 3 T, we report only TX t, or equivalently x A t Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 22 / 39 The partition of the sample space based on TX is coarser than the original sample space There are 8 elements in the sample space X They are partitioned into 4 subsets Thus, TX is simpler or coarser than X Definition 621 A statistic TX is a sufficient statistic for θ if the conditional distribution of sample X given the value of TX does not depend on θ In other words, the conditional pdf or pmf of X given T t, f X x TX t hx does not depend on θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 23 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 24 / 39

: Example Proof : Overview Suppose X 1,, X n iid Bernoullip, 0 < p < 1 Claim that TX 1,, X n n X i is a sufficient statistic for p TX n X i Binomialn, p Need to find the conditional pmf of X given T t And show that the distribution does not depend on p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 25 / 39 Detailed Proof Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 26 / 39 Detailed Proof cont d If n X i t, t Binomialn, p Pr X 1 x 1,, X n x n X i t Pr X 1 x 1,, X n x n, n X i t Pr n X i t Pr X 1 x 1,, X n x n Pr n X if n i t X i t 0 otherwise PrX 1 x 1,, X n x n Pr Pr X i t X x X i t n PrX i x i p x 1 1 p 1 x1 p x n 1 p 1 x n p n x i 1 p n n x i n t 1 n t p t 1 p n t Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 27 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

Detailed Proof cont d Note from the proof Therefore, conditional distribution Pr X x X i t { 1 n if n t X i t 0 otherwise If X is a sample point such that TX t, then PrX x Tx t 0 always, so we don t have to consider the case when Tx t in the future Because PrX TX t does not depend on p, by definition, TX n X i is a sufficient statistic for p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 29 / 39 A Theorem for Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 30 / 39 Proof of Theorem 622 - discrete case Theorem 622 Let f X x θ is a joint pdf or pmf of X and qt θ is the pdf or pmf of TX Then TX is a sufficient statistic for θ, if, for every x X, the ratio f X x θ/qtx θ is constant as a function of θ Pr X x TX t Pr X x, TX t PrTX t PrX x if Tx t PrTX t 0 otherwise f X x θ if Tx t qtx θ 0 otherwise which does not depend on θ by assumption Therefore, TX is a sufficient statistic for θ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 31 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39

Example 623 - Binomial Sufficient Statistic Example 623 - Binomial Sufficient Statistic Proof Problem iid X 1,, X n Bernoullip, 0 < θ < 1 Show that TX n X i is a sufficient statistic for θ This is the same problem from the last lecture, but we would like to solve is using Theorem 622 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 33 / 39 Example 624 - Normal Sufficient Statistic Problem iid X 1,, X n N µ, σ 2 Assume that σ 2 is known Show that the sample mean TX X 1 n n X i is a sufficient statistic for µ f X x p p x 1 1 p 1 x1 p x n 1 p 1 x n p n x i 1 p n n x i TX Binomialn, p n qt p p t 1 p n t t f X x p qtx p n n x i 1 n n x i p n x i 1 p n n x i p n x i1 p n n x i 1 n Tx By theorem 622 TX is a sufficient statistic for p Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39 Example 624 - Proof f X x µ n 1 f X x µ exp x i µ 2 2πσ 2 2σ 2 2πσ 2 n/2 x i µ 2 exp 2σ 2 2πσ 2 n/2 x i x + x µ 2 exp 2πσ 2 n/2 exp 2σ 2 n x i x 2 + nx µ 2 2σ 2 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 35 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39

Example 624 - Proof cont d Example 624 - Proof qtx µ Remember from BIOSTAT601 that TX X N µ, σ 2 /n qtx µ 1 2πσ 2 /n exp nx µ 2 /2σ 2 Putting things together f X x µ qtx µ n 2πσ 2 n/2 exp x i x 2 + nx µ 2 2σ 2 2πσ 2 /n 1/2 nx µ2 exp 2σ 2 n n 1/2 2πσ 2 n 1/2 exp x i x 2 2σ 2 which does not depend on µ By Theorem 622, the sample mean is a sufficient statistic for µ Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 37 / 39 Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 38 / 39 Today Overview of Key differences between BIOSTAT601 and X is conditionally independent on θ given TX If ratio of pdfs between X and TX does not depend on θ, TX is a sufficient statistic Next Lecture More on Factorization Theorem Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 39 / 39