Tweedie s Formula and Selection Bias. Bradley Efron Stanford University
|
|
- Louisa Ward
- 6 years ago
- Views:
Transcription
1 Tweedie s Formula and Selection Bias Bradley Efron Stanford University
2 Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values? What can we say about their corresponding Selection Bias selected z s. The µ s will usually be smaller than the Tweedie s Formula 1
3 Gamma example: z[i]~ N(mu[i],1), i=1,2,...n=5000; dashes show the 100 largest z[i] values Frequency z(1) z values Tweedie s Formula 2
4 Gamma example: z[i]~n(mu[i],1) i=1,2, ; green curve shows true mu values; blue dashes at right are 100 largest z's; red stars indicate the corresponding mu values index z and mu values 9.02 Tweedie s Formula 3
5 Tweedie s Formula (Robbins, 1956) Bayes Model µ g( ) and z µ N(µ, 1) Marginal Density Tweedie s Formula E{µ z} = z + l (z) f (z) = [ ] ϕ(z µ)g(µ) dµ ϕ = e 1 2 z2 2π (Normal version) [l (z) = ddz ] log f (z) MLE Bayes correction Advantage Only need f, not g. Tweedie s Formula 4
6 Higher Moments Cumulant Generating Function of µ given z: cgf(z) = 1 2 z2 + l(z) [ l(z) = log f (z) ] so var{µ z} = 1 + l (z), skew{µ z} = l (z), etc. [1 + l 3/2 (z)] µ z (z + l, 1 + l ) Bayes Risk (C.-H. Zhang, 1997) µ = E{µ z} : E { (µ µ ) 2 } = 1 E { l (z) 2} Tweedie s Formula 5
7 Empirical Bayes Estimates Use z = (z 1, z 2,..., z N ) to get estimate f ˆ (z), ˆl(z) = log f ˆ (z); take µ i z i ( z i + ˆl ), 1 + ˆl i i }{{}}{{} ˆµ i var i Idea: Bayes estimates are immune to selection bias maybe empirical Bayes estimates ˆµ i are, too! Tweedie s Formula 6
8 Maximum Likelihood Estimation of f (z) Parametric Model Suppose l(z) = log f (z) = J j=0 β jz j, with MLE ˆl(z) = J j=0 ˆβ j z j. Lindsey s Method Histogram bins Z = K k=1, y k = #{z i Z k }. ˆl from glm(y poly(x,j), Poisson) where x is vector of bin centers. [Slide 2: K = 63 bins of width 0.2] Tweedie s Formula 7
9 The James Stein Estimator If µ i N(0, A) and z i µ i N(µ i, 1), then z i N(0, V = A + 1). log marginal density l(z i ) = z 2 i / 2V Quadratic (J = 2), E{µ i z i } = z i z i V. James Stein ˆµ i = z i z i ˆV where ˆV 1 = (N 2) z 2 Tweedie estimates are a generalization of James Stein. Tweedie s Formula 8
10 Fitted curve lhat(z) for Gamma example, using Lindsey's method, natural spline model with J=5 degrees of freedom; points are log histogram counts log(fhat(z)) z value NOTE: l(z) concave implies posterior variance < 1 Tweedie s Formula 9
11 Estimated posterior expectation Ehat{mu z}=z+lhat'(z); red stars are uhat[i]'s for largest 100 z[i]'s Ehat{mu z} (9.02,7.64) 100 largest z[i]'s z value small green dots show true Bayes rule Tweedie s Formula 10
12 Does Tweedie Cure Selection Bias? Simulations µ i Gamma, and z i N(µ i, 1) for i = 1, 2,..., N = Simulations: Select top 20 and bottom 20 z i s each time. Next histograms show ˆµ i µ i. Tweedie s formula works well even for N = 200. Tweedie s Formula 11
13 100 simulations of Gamma model, N=1000 z[i]'s per simulation; Solid histogram muhat[i] mu[i] for top 20 per simulation; line histogram z[i] mu[i] Frequency muhat[i] mu[i] Tweedie s Formula 12
14 Now for Bottom 20 per simulation Frequency muhat[i] mu[i] Tweedie s Formula 13
15 Bayes Regret (Muralidharan, 2009; Zhang, 1997) Regret Reg(z 0 ) E [ µ ˆµ z (z 0 ) ]2 E [ µ µ (z 0 ) ] 2 with z = (z i,..., z N ) and µ z 0 random, z 0 fixed. Reg(z 0 ) depends on accuracy of ˆl z(z 0 ) as estimate of l (z 0 ) = d dz log f (z) z0 Reg(z 0 ) = E [ˆl z (z 0 ) l (z 0 ) ] 2 Asymptotically Reg(z 0 ) var {ˆl z (z 0 ) } c(z 0 )/N Tweedie s Formula 14
16 RMS Regret for Gamma Example %ile z-value N = N = N = c(z 0 )/ Tweedie s Formula 15
17 Empirical Bayes Information As N increases ˆµ i goes from MLE z i (N = 1) to Bayes estimate µ (N = ) i Posterior Variability E [ µ i ˆµ i ] 2 = var i + Reg(z i) ˆµ i undependable for most extreme few z i s ( var i 1 ) Empirical Bayes Information (per other observation): 1 I(z 0 ) = lim ( N N Reg(z0 ) ) = 1 [ Reg(z 0 ) ] 1 NI(z 0 ) c(z 0 ) Tweedie s Formula 16
18 Empirical Bayes information per observation Ireg(z0) for Gamma model. (Dashed curve for James Stein) Ireg(z0) value z0 Tweedie s Formula 17
19 CNV Data (Efron and Zhang, 2010) Copy Number Variation 150 healthy controls measured at 5000 positions ˆk i = estimated number of cnv subjects at position i, i = 1, 2,..., N = 5000 Bootstrapping Subjects = ˆk i N ( k i, σ 2 i ) with σ 2 i increasing in k i Selected position i = 1755 with ˆk i = 35.3 Empirical Bayes inference for k i? (Don t need independence!) ( σ 2 i 6.5 ) Tweedie s Formula 18
20 k values vs position, cnv data k value position Tweedie s Formula 19
21 The 5000 k values for cnv example (first 3 freqs 1908, 560,357 Frequency k[1755] =35.3 2nd Mode k values Tweedie s Formula 20
22 log{fhat(k)} = lhat(k), CNV data; ns df=7; stars are log bin proportions log fit nd Mode k value Tweedie's formula for Ehat{k khat} Ehat{mu k} nd Mode k value Tweedie s Formula 21
23 Which Other Cases are Relevant? EB estimate of k 1755 depends on which other ˆk i s thought relevant: Relevant Cases 1:5000 1: :2000 ˆµ Large ˆk s in 3000:5000 pull up estimate ˆµ 1755 Bayes Problem on position Relevant prior g(k) may depend Tweedie s Formula 22
24 Gamma5000 example; solid red points are max values in successive groups of 50, with squares showing corresponding estimates Ehat{mu z} from combined Empirical Bayes analysis. Upwardly biased! z value and EB estimate actual mu values position Tweedie s Formula 23
25 Relevance Let ρ 0 (i) be relevance of case i to target case i 0 Examples (1) ρ 0 (i) = 1 if i i 0 ± 500; 0 otherwise (2) ρ 0 (i) = exp { i i 0 /500} Extended Tweedie Formula ˆµ i0 = z i0 + ˆl (z i0 ) + d dz log ˆR 0 (z) where R 0 (z) = E { ρ 0 (i) z } [ Regress ρ0 (i) on z i to get ˆR 0 (z). ] Using (1) or (2) cures bias on previous panel. zi0 Tweedie s Formula 24
26 Non-Constant Variance If z N(µ, σ 2) then E{µ z} = z + 0 σ2 0 l (z) (took σ 0 = 6.5 for cnv) Suppose z N(µ, σ 2 µ) Theorem g(µ z 0 ) g 0 (µ z 0 ) = c 0λ µ e 1 2 (λ2 µ 1) 2 µ where σ 0 = σ µ=z0 and g 0 (µ z 0 ) is distribution assuming constant σ = σ 0. λ µ = σ 0 /σ µ µ = (µ z 0 ) / σ 0 Tweedie s Formula 25
27 posterior density Estimated posterior distribution for k[1755], CNV example; Assuming sig0=6.5 and Normality (dashed); or Adjusted for varying sigma (solid) k[1755] Tweedie s Formula 26
28 False Discovery Rates µ g( ) = π 0 δ 0 ( ) + π 1 g 1 ( ) and z µ N(µ, 1) π 0 = prior Pr{null} Then fdr(z) = Pr{null z} = π 0 ϕ(z) / f (z) ( local false discovery rate ) d dz log fdr(z) = z + l (z) = E{µ z} Tail Area Fdr Fdr(z 0 ) = Pr{null z z 0 } = π 0 Φ(z 0 ) / F(z) where F(z) is cdf of mixture density f (z) Tweedie s Formula 27
29 Prostate data z values. N=6033 genes measured for each of 102 subjects; z vals from two sample t stats comparing 52 prostate cancer patients with 50 healthy controls. Frequency z value Empirical Bayes estimate Ehat{mu z} Ehat{mu z} z value Tweedie s Formula 28
30 p-value Selection Bias Observe z 1, z 2,..., z N and select smallest values of p i = Φ(z i ) Significance? Benjamini Hochberg / Reject small values of Fdr i = π 0 p i ˆF(z i ), or large values of log ( Fdri ) = log(pi ) + log ( ˆF(z i ) / ) ˆπ 0 EB estimate frequentist EB correction estimate (usually ˆπ 0 = 1) Tweedie s Formula 29
31 log{pvalue} (dashed) and log{fdr} (solid) for prostate data, right sided log(fdr(z)) z= p value Fdr z value Tweedie s Formula 30
32 References Efron, B. and Zhang, N. (2010). False discovery rates and copy number variation, brad/. Muralidharan, O. (2009). High dimensional exponential family estimation via empirical Bayes, under review. Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Symposium. Berkeley: UC Press, Zhang, C.-H. (1997). Empirical Bayes and compound estimation of normal means. Statist. Sinica 7: Tweedie s Formula 31
Tweedie s Formula and Selection Bias
Tweedie s Formula and Selection Bias Bradley Efron Stanford University Abstract We suppose that the statistician observes some large number of estimates z i, each with its own unobserved expectation parameter
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationBayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University
Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice
More informationLarge-Scale Hypothesis Testing
Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationThe locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5
Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates
More informationCorrelation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University
Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationPackage locfdr. July 15, Index 5
Version 1.1-8 Title Computes Local False Discovery Rates Package locfdr July 15, 2015 Maintainer Balasubramanian Narasimhan License GPL-2 Imports stats, splines, graphics Computation
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationLarge-Scale Inference:
Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction Bradley Efron Stanford University Prologue At the risk of drastic oversimplification, the history of statistics as
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationLinear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationEmpirical Bayes Deconvolution Problem
Empirical Bayes Deconvolution Problem Bayes Deconvolution Problem Unknown prior density g(θ) gives unobserved realizations Θ 1, Θ 2,..., Θ N iid g(θ) Each Θ k gives observed X k p Θk (x) [p Θ (x) known]
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationEstimation of the False Discovery Rate
Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationExam C Solutions Spring 2005
Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8
More informationPractice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:
Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationFalse discovery rates and copy number variation
Biometrika (211), 98,2,pp. 251 271 C 211 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asr18 False discovery rates and copy number variation BY BRADLEY EFRON AND NANCY R. ZHANG Department
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationPackage FDRreg. August 29, 2016
Package FDRreg August 29, 2016 Type Package Title False discovery rate regression Version 0.1 Date 2014-02-24 Author James G. Scott, with contributions from Rob Kass and Jesse Windle Maintainer James G.
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationProbabilistic Inference for Multiple Testing
This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,
More informationHigh-throughput Testing
High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector
More informationEmpirical Bayes modeling, computation, and accuracy
Empirical Bayes modeling, computation, and accuracy Bradley Efron Stanford University Abstract This article is intended as an expositional overview of empirical Bayes modeling methodology, presented in
More informationMutual fund performance: false discoveries, bias, and power
Ann Finance DOI 10.1007/s10436-010-0151-9 RESEARCH ARTICLE Mutual fund performance: false discoveries, bias, and power Nik Tuzov Frederi Viens Received: 17 July 2009 / Accepted: 17 March 2010 Springer-Verlag
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative
More information2. A Basic Statistical Toolbox
. A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationControlling Bayes Directional False Discovery Rate in Random Effects Model 1
Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationBy Bradley Efron Stanford University
The Annals of Applied Statistics 2008, Vol. 2, No. 1, 197 223 DOI: 10.1214/07-AOAS141 c Institute of Mathematical Statistics, 2008 SIMULTANEOUS INFERENCE: WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED?
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE
Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering
More information4.2 Estimation on the boundary of the parameter space
Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45
Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationA G-Modeling Program for Deconvolution and Empirical Bayes Estimation
A G-Modeling Program for Deconvolution and Empirical Bayes Estimation Balasubramanian Narasimhan Stanford University Bradley Efron Stanford University Abstract Empirical Bayes inference assumes an unknown
More informationDELTA METHOD and RESERVING
XXXVI th ASTIN COLLOQUIUM Zurich, 4 6 September 2005 DELTA METHOD and RESERVING C.PARTRAT, Lyon 1 university (ISFA) N.PEY, AXA Canada J.SCHILLING, GIE AXA Introduction Presentation of methods based on
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationShort Questions (Do two out of three) 15 points each
Econometrics Short Questions Do two out of three) 5 points each ) Let y = Xβ + u and Z be a set of instruments for X When we estimate β with OLS we project y onto the space spanned by X along a path orthogonal
More informationAnswer Key for STAT 200B HW No. 8
Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationA Large-Sample Approach to Controlling the False Discovery Rate
A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationEstimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq
Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,
More informationSimultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationFrequentist Statistics and Hypothesis Testing Spring
Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationNew Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER
New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More information(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.
Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationExam 2 Practice Questions, 18.05, Spring 2014
Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More information