课内考试时间? 5/10 5/17 5/24 课内考试? 5/31 课内考试? 6/07 课程论文报告

Similar documents
课内考试时间 5/21 5/28 课内考试 6/04 课程论文报告?

Lecture 2: Introduction to Probability

TUTORIAL 8 SOLUTIONS #

Practice Problems Section Problems

Lecture 35. Summarizing Data - II

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Conditional expectation and prediction

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Learning Objectives for Stat 225

Lecture 2. Random variables: discrete and continuous

Molecular weights and Sizes

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Source mechanism solution

通量数据质量控制的理论与方法 理加联合科技有限公司

BTRY 4090: Spring 2009 Theory of Statistics

MIT Spring 2015

STATISTICS SYLLABUS UNIT I

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Effect of Polarization on Mechanical Properties of Lead Zirconate Titanate Ceramics

2012 AP Calculus BC 模拟试卷

Lecture 10: Generalized likelihood ratio test

APPROVAL SHEET 承认书 厚膜晶片电阻承认书 -CR 系列. Approval Specification for Thick Film Chip Resistors - Type CR 厂商 : 丽智电子 ( 昆山 ) 有限公司客户 : 核准 Approved by

上海激光电子伽玛源 (SLEGS) 样机的实验介绍

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Hypothesis testing:power, test statistic CMS:

Statistical Data Analysis Stat 3: p-values, parameter estimation

USTC SNST 2014 Autumn Semester Lecture Series

The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA 报告人 : 沈胤

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Subject CS1 Actuarial Statistics 1 Core Principles

三类调度问题的复合派遣算法及其在医疗运营管理中的应用

Chapter 2 the z-transform. 2.1 definition 2.2 properties of ROC 2.3 the inverse z-transform 2.4 z-transform properties

Distribution Fitting (Censored Data)

Northwestern University Department of Electrical Engineering and Computer Science

Galileo Galilei ( ) Title page of Galileo's Dialogue concerning the two chief world systems, published in Florence in February 1632.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Math 494: Mathematical Statistics

GRE 精确 完整 数学预测机经 发布适用 2015 年 10 月考试

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Primer on statistics:

Chapter 2 Bayesian Decision Theory. Pattern Recognition Soochow, Fall Semester 1

Digital Image Processing. Point Processing( 点处理 )

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Modeling and Performance Analysis with Discrete-Event Simulation

Stat 5101 Lecture Notes

STAT 461/561- Assignments, Year 2015

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

生物統計教育訓練 - 課程. Introduction to equivalence, superior, inferior studies in RCT 謝宗成副教授慈濟大學醫學科學研究所. TEL: ext 2015

Atomic & Molecular Clusters / 原子分子团簇 /

Random Number Generation. CS1538: Introduction to simulations

NiFe layered double hydroxide nanoparticles for efficiently enhancing performance of BiVO4 photoanode in

Spring 2012 Math 541B Exam 1

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

Ch.9 Liquids and Solids

A Tutorial on Variational Bayes

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Synthesis of PdS Au nanorods with asymmetric tips with improved H2 production efficiency in water splitting and increased photostability

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Lecture 1: August 28

5 Introduction to the Theory of Order Statistics and Rank Statistics

Summary of Chapters 7-9

Recall the Basics of Hypothesis Testing

d) There is a Web page that includes links to both Web page A and Web page B.

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Lecture 21: October 19

Chapter 5. Means and Variances

偏微分方程及其应用国际会议在数学科学学院举行

CT reconstruction Simulation

Cooling rate of water

Lecture Note on Linear Algebra 14. Linear Independence, Bases and Coordinates

MATH4427 Notebook 4 Fall Semester 2017/2018

Ling 289 Contingency Table Statistics

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Probability Distributions Columns (a) through (d)

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Chapter 4. Mobile Radio Propagation Large-Scale Path Loss

Robustness and Distribution Assumptions

Geomechanical Issues of CO2 Storage in Deep Saline Aquifers 二氧化碳咸水层封存的力学问题

Lecture 21. Hypothesis Testing II

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Algorithms and Complexity

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Stat 710: Mathematical Statistics Lecture 31

simple if it completely specifies the density of x

Nonparametric probability density estimation

Y i = η + ɛ i, i = 1,...,n.

Key Topic. Body Composition Analysis (BCA) on lab animals with NMR 采用核磁共振分析实验鼠的体内组分. TD-NMR and Body Composition Analysis for Lab Animals

Ch. 5 Hypothesis Testing

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Preparation of Cu nanoparticles with NaBH 4 by aqueous reduction method

An Introduction to Generalized Method of Moments. Chen,Rong aronge.net

Effects of Au nanoparticle size and metal support interaction on plasmon induced photocatalytic water oxidation

Statistical Inference: Uses, Abuses, and Misconceptions

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Institute of Actuaries of India

Synthesis of anisole by vapor phase methylation of phenol with methanol over catalysts supported on activated alumina

MIT Spring 2015

Transcription:

课内考试时间? 5/10 5/17 5/24 课内考试? 5/31 课内考试? 6/07 课程论文报告

Testing Hypotheses and Assessing Goodness of Fit

Generalized likelihood ratio tests The likelihood ratio test is optimal for simple vs. simple hypotheses. Generalized likelihood ratio tests are for use when hypotheses are not simple. They are not generally optimal, but are typically non-optimal in situations where no optimal test exists, and they usually perform reasonably well. Wide utility, plays the same role in testing as MLE s do in estimation. - Suppose that the observations X = (X 1,..., X n ) have a joint distribution f (x θ). - H 0 may specify that θ ω 0, where ω 0 is a subset of the set of all possible θ values - H 1 may specify that θ ω 1, where ω 1 is disjoint from ω0. - Let Ω = ω 0 ω 1. - A plausible measure of the relative tenability of the hypotheses is the ratio of their likelihoods. - Generalized likelihood ratio: discredit H 0 if small - In practical, use, instead, and note - Rejection region: small values.

Generalized likelihood ratio tests So far so good, but In order for the likelihood ratio test to have the significance level α, λ 0 must be chosen so that P(Λ λ 0 )=α if H 0 is true. - If the sampling distribution of Λ under H 0 is known, we can determine λ 0. - Generally, the sampling distribution is not of a simple form, what to do? A theorem as the basis for an approximation to the null distribution: Dim Ω and dim ω 0 are # of free parameters under Ω and ω 0, respectively. The above Gaussian example: - H 0 completely specifies μ and σ 2, dim ω 0 = 0 - Under Ω, σ is fixed, μ is free, dim Ω = 1 - Null distribution of 2 log Λ is approximately (exactly!)

Generalized likelihood Ratio Tests for the Multinomial Distribution H 0 specifies that p = p(θ), θ ω 0, θ is a parameter (may be unknown); H 1 cell probabilities are free except that they are nonnegative and sum to 1. # of cells is m, Ω is the set consisting of m nonnegative numbers summing to 1. Likelihood ratio s numerator is maximized when the MLE is in place of θ: Probabilities are unrestricted under Ω, denominator is maximized by the unrestricted MLE s,

Generalized likelihood Ratio Tests for the Multinomial Distribution - Under Ω, cell probabilities are free but summing to 1, so dim Ω=m 1 - Under H 0, probabilities depend on a k-dimension parameter θ, dim ω 0 =k - The large sample distribution of 2 log Λ is chi-square with dof=m k 1. Pearson s chi-square statistic is commonly used to test for goodness of fit

Pearson s chi-square test Likelihood ratio If H 0 is true, n is large, Taylor series expansion of the function about x 0 is So, 0, because np=1 Pearson s statistic & likelihood ratio are asymptotically equivalent under H 0. But Pearson s chi-square test is easier in calculation and more widely used.

Pearson s chi-square test Okay, but why do we choose multinomial distribution as H 1? Because in general, you cannot make it even more general/fuzzy/vague H 0 : Any real distribution = all the parameters p i describing data in each bin are strongly correlated. H 1 : Multinomial distribution = almost no correlation between the parameters p i describing data in each bin, # of parameters = # of outcomes 1. Way too many! E.g. Poisson: all p i s are determined by a single parameter.

Pearson s chi-square test: examples Hardy-Weinberg Equilibrium. Let s test whether the model fits the data. MLE for θ gives multiplying the resulting probabilities by sample size n=1029, calculate expected counts, compare with observations: H 0 : multinomial distribution is as specified by the H-W frequencies, unknown θ H 1 : multinomial distribution does not have probabilities of that specified form

Pearson s chi-square test: examples Hardy-Weinberg Equilibrium. Let s test whether the model fits the data. Significance level for the test, α( 错杀 概率 ), set to be 0.05 for no reason. Large sample: 2 log Λ ~ χ 2 (theorem), 2 log Λ ~ Pearson X 2, so Pearson X 2 ~ χ 2 D.o.f. = dim Ω (3 cells constraint of summing to 1) dim ω 0 (1 parameter) = 1 The upper 5% of χ 2 1 is 3.84, so the test rejects if X 2 > 3.84. Let s calculate X 2 : So H 0 is not rejected.

Pearson s chi-square test: examples Hardy-Weinberg Equilibrium. Let s test whether the model fits the data. Significance level for the test, α( 错杀 概率 ), set to be 0.05 for no reason. Large sample: 2 log Λ ~ χ 2 (theorem), 2 log Λ ~ Pearson X 2, so Pearson X 2 ~ χ 2 D.o.f. = dim Ω (3 cells constraint of summing to 1) dim ω 0 (1 parameter) = 1 The upper 5% of χ 2 1 is 3.84, so the test rejects if X 2 > 3.84. Let s calculate X 2 : So H 0 is not rejected. Again, no reason to choose α=0.05, and why should we have to make the decision? Let s use p-value (smallest significance level at which H 0 would be rejected). Probability that a chi-square random variable ~χ 2 1 is 0.0319 is 0.86=p-value: if the model was correct, deviations this large or larger would occur 86% of the time. - Likelihood ratio test statistic is the same: - Actual max likelihood ratio So H-W model is almost as likely as the most general possible model!

Pearson s chi-square test: examples Bacterial clumps. Let s test whether Poisson fits the data. MLE gives us Poisson model with H 0 : Poisson, H 1 : multinomial Chi-square statistic is X 2 =75.4, dof=dim Ω (8 1) dim ω 0 (1) = 6. Null hypothesis is conclusively rejected (p-value<0.005). Reason: too many small counts and large counts to be Poisson.

Poisson Dispersion test Likelihood ratio test, Pearson s chi-square test are w.r.t. the general form of H 1 : cell probabilities are completely free (that s why they are so useful). But if one has a specific H 1 in mind, better power may be obtained by testing against that H 1 rather than against a more general alternative. Poisson dispersion test is w.r.t. H 0 : a distribution is Poisson. Background and the need for it: Key assumptions (the rate is constant, the counts in one interval of time or space are independent of the counts in disjoint intervals) are often not met. - Count insects on leaves of plants. Different sizes, various locations on different plants; rate of infestation may well not be constant over the different locations. - If insects hatched from eggs that were deposited in groups, then clustering of insects and the independence assumption might fail. - Motor vehicle counts for traffic studies typically vary cyclically over time.

Poisson Dispersion test Given counts x 1,,x n. H 0 : Counts are Poisson with the common parameter λ. (ω 0 ) H 1 : Counts are Poisson but with different rates, λ 1,, λ n. (Ω) MLE for ω 0 : MLE for Ω: = x i (Taylor expansion) = n D.o.f.=dim Ω dim ω 0 =n 1 Sensitive to (=has high power against) alternatives that are overdispersed relative to Poisson (e.g. negative binomial distribution.) The ratio is sometimes used as a measure of clustering of data.

Poisson Dispersion test Example. Poisson dispersion test: Likelihood ratio test: D.o.f.=23 1=22, p-value=0.21 The evidence against the null hypothesis is not persuasive.

Poisson Dispersion test Example. ( 细菌凝块 ) Under H 0, T ~ chi-square distribution with d.o.f.=399. Chi-square with d.o.f. m = sum of m independent N(0, 1) squared, CLT says for large m it is ~normal. Chi-square: mean = d.o.f., variance = 2 d.o.f., p-value can be found by standardizing the statistic: Poission doesn t fit!

Informal (graphical) techniques for assessing goodness of fit

Probability plots Extremely useful graphical tool for qualitatively assessing the fit of data to a theoretical distribution. Consider a sample of size n from a uniform distribution on [0, 1]. Denote the ordered sample values by X (1) <X (2) <X (n) (called order statistics). Question: What is the expectation of X (j)? Make a guess!

Probability plots Extremely useful graphical tool for qualitatively assessing the fit of data to a theoretical distribution. Consider a sample of size n from a uniform distribution on [0, 1]. Denote the ordered sample values by X (1) <X (2) <X (n) (called order statistics). The expectation of X (j) is Plot the ordered observations X (1),, X (n) against their expected values 1/(n+1),..., n/(n+1), in the case of a uniform distribution, it must be roughly linear.

Probability plots Now suppose Y=U 1 /2+U 2 /2, then Y s distribution is (by convolution) triangular: Generate Y 1 to Y n, plot Y (1) to Y (n) against the points 1/(n+1),..., n/(n+1). Data = triangular, theory = uniform, plot deviates from linearity: - Left tail: order statistics > expected for a uniform distribution - Right tail: opposite - Tails are lighter than uniform

Probability plots The technique can be extended to other continuous probability laws this way: If X is a continuous random variable with a strictly increasing cdf, F X, and if Y = F X (X), then Y has a uniform distribution on [0, 1]. The transformation Y = F X (X) is known as the probability integral transform. Procedure: Suppose X is hypothesized to follow a certain distribution with cdf F. Given a sample X 1,, X n, we plot (e.g. normal) μ=location parameter σ=scale parameter Or the result would be approximately a straight line if the model were correct:

Probability plots Slight modifications of the procedure are sometimes used: This is because it can be shown that So we can plot X (k) vs. E(X (k) ) -- back to our first uniform example. Viewed from another perspective: F 1 [k/(n + 1)] is the k/(n + 1) quantile of the distribution F, the point such that the probability that a random variable with cdf F is less than it is k/(n + 1). We are thus plotting the ordered observations (can be viewed as the observed or empirical quantiles) versus the quantiles of the theoretical distribution.

Probability plots: examples A set of 100 observations, which are Michelson s determinations of the velocity of light made from June 5, 1879 to July 2, 1879; 299,000 has been subtracted from the determinations to give the values listed [Stigler (1977)]: Theoretical model=gaussian, plot is close to straight line, a qualitatively good fit. Use cautions: - Probability plots are by nature monotonically increasing and they all tend to look fairly straight. Some experience is necessary in gauging straightness. - Simulations are very helpful in sharpening one s judgment

Probability plots: examples Nonnormal distributions. Below is a normal probability plot of 500 pseudorandom variables from a double exponential distribution: Its tails die off at the rate exp(- x ), slower than Gaussian, exp(-x 2 ). Plot bends down at the left and up at the right: observations in the left tail more negative than expected for Gaussian, observations in the right tail more positive. The extreme observations were larger in magnitude than extreme observations from a Gaussian. The tails of the double exponential are heavier than those of a Gaussian.

Probability plots: examples Nonnormal distributions: Gamma. 500 pseudorandom numbers from a gamma distribution with scale parameter λ=1, shape parameter α=5. Bowlike skewed/nonsymmetric.

Probability plots: examples Gamma probability plot of the precipitation amounts. A computer was used to find the quantiles of a gamma distribution with parameter α =.471 and λ = 1. The plot is observed sorted values of precipitation versus the quantiles. Qualitatively, the fit is reasonable, there is no gross systematic deviation from a straight line. Probability plots for grouped data ( 分组数据 ). Suppose that the grouping gives the points x 1,, x m+1 for the histogram s bin boundaries and that in the interval [x i, x i+1 ) there are n i counts, where i = 1,..., m. We denote the cumulative frequencies by Then N 1 < N 2 < < N m and N m = total size n. One can plot

Summarizing data

Empirical cdf Methods of describing and summarizing data that are in the form of one or more samples, or batches. These procedures often generate graphical displays, are useful in revealing the structure of data. In the absence of a stochastic model, the methods are useful for purely descriptive purposes. If it is appropriate to entertain a stochastic model, the implications of that model for the method are of interest. Sample : x i are i.i.d. with some distribution function Batch : imply no such commitment to a stochastic model Suppose x 1,, x n is a batch of numbers. The empirical cdf (ecdf) is defined as Denote the ordered batch of numbers by x (1) x (2) x (n). Then if x x (1), F n (x)=0; if x (1) x<x (2), F n (x)=1/n, if x (k) x x (k+1), F n (x)=k/n Single observation with value x, F n has a jump of height 1/n at x; r observations with same value x, F n has a jump of height r/n at x F(x) gives the probability that X x F n (x) gives the proportion of the collection of numbers less than or equal to x

Empirical cdf: example. Example: White, Riethof & Kushnir (1960) 蜂蜡化学性质的研究数据, 研究目的是通过一些化学试验, 探测蜂蜡中人造蜡的存在 ( 影响熔点 ) 作者得到 59 个纯蜂蜡的样本, 熔点 ( 摄氏度 ) 如下 : Conveniently summarizes the natural variability in melting points. We see about 90% of the samples had melting points < 64.2 C, about 12% had melting points < 63.2 C.

Empirical cdf: example. Elementary statistical properties of the ecdf in the case in which X 1,..., X n is a random sample from a continuous cdf, F. It is convenient to express F n as: In fact, the random variables I (,x] (X) are independent Bernoulli random variables: As an estimate of F(x), F n (x) is unbiased and has a maximum variance at that value of x such that F(x) = 0.5, that is, at the median. As x becomes large or small, the variance tends to zero.

We considered F n for fixed x, but much deeper analysis focuses on the stochastic behavior of F n as a random function (consider all x values simultaneously). Surprisingly, it turns out that the following does not depend on F if F is continuous! This result forms the foundation of the famous and extremely widely used Kolmogorov-Smirnov test: This maximum difference corresponds to a unique p-value (ready from mathematician) for rejecting the hypothesis that the two (e)cdfs are from the same distribution. 1-sample K-S test: data vs. model 2-sample K-S test: data vs. data

Quantile-quantile (Q-Q) plots are useful for comparing distribution functions. The pth quantile of the distribution was defined before as In a Q-Q plot, the quantiles of one distribution vs. those of another. F(x): model for data of a control group; G(y): model for data of a treatment group. Simplest effect #1: increase expected response of every member of the treatment group by the same h. Then y p = x p + h, Q-Q plot is a straight line with slope 1, intercept h. Cdf s relation: Simplest effect #2: The response is multiplied by a constant c. Then the quantiles are related as y p = c x p, Q-Q plot is a straight line with slope c, intercept 0. Cdf s relation

Quantile-quantile plots The effect of a treatment can be much more complicated - an educational program that places very heavy emphasis on elementary skills may be a treatment that benefits weaker individuals but harm stronger individuals. Given a batch of numbers, or a sample from a pdf, quantiles are constructed from the order statistics. Given n observations and the order statistics X (1),..., X (n), the k/(n+1) quantile of data is assigned to X (k). we did this for probability plots to in formally assess goodness of fit. To compare two batches of n numbers with order statistics X (1),..., X (n) and Y (1),..., Y (n), a Q-Q plot is simply constructed by plotting the points (X (i), Y (i) ). If the batches are of unequal size, an interpolation process can be used.

Quantile-quantile plots 加法效应 乘法效应 臭氧 : 最高分位值在平日, 其他所有分位值周日较大 CO, NO, 气雾剂 : 分位数差别随浓度的增大而增大 太阳辐射 : 极高和极低的分位数在周日和平日近同

Histograms, density curves, stem-and-leaf plots

Histograms displays the shape of the distribution of data values in the same sense that a density function displays probabilities. - The range of data is divided into intervals/bins, plot counts OR proportion of the observations falling in each bin. - Often recommended: plot the proportion of observations falling in the bin divided by the bin width, then the area under the histogram is 1. Bin width: - Too small: histogram too ragged - Too wide: oversmoothed and obscured - Usually made subjectively in an attempt to strike a balance

Histograms - frequently used to display data for which there is no assumption of any stochastic model - or, may be viewed as an estimate of the probability density: regarded in this sense, the histogram suffers from not being smooth How to construct a smooth probability density estimate? Let w(x) be a nonnegative, symmetric weight function, centered at zero, integrating to 1 (e.g. standard normal density). A rescaled version of w is h 0, concentrated and peaked about zero. h, spread out and flatter If w is standard normal, w h (x) is normal density with standard deviation h. If X 1,..., X n is a sample from pdf f, an estimate of f is Called a kernel probability density estimate.

Histograms Smoothing kernel A kernel probability density estimate consists of hills centered on the observations. If w is standard normal, w h (x X i ) is normal with parameters X i, h h controls the bandwidth of the estimating function, controls its smoothness and corresponds to the bin width of the histogram. h too small Good h too wide

Stem-and-leaf plots - One disadvantage of a histogram or a probability density estimate is that information is lost; neither allows the reconstruction of the original data. - A histogram does not allow for calculating a statistic such as a median; one can tell which bin but not the actual value. Tukey (1977): stem-and-leaf plots convey info about shape while retaining the numerical info. Easy for human AND computers. 列 1: 累计个数 ( 至中位数, 双向 ) 列 2: 每个茎的叶数 列 3: 数据 X10 的整数 列 4: 数据 X10 后的小数 Shape information

Problem set #5 1. Reproduce the figure for Michelson s data on the speed of light. 2. Generate values of a uniformly distributed random variables between 0 and 1. Compare it with a uniform CDF model using K-S test. You do not need to write the code, but can borrow it anywhere you like (e.g. NASA s IDL library astrolib has the routine ksone.pro). 3. Draw the stem-and-leaf plot in the lecture with your own hands.