Joyce, Krone, and Kurtz

Size: px
Start display at page:

Download "Joyce, Krone, and Kurtz"

Transcription

1 Statistical Inference for Population Genetics Models by Paul Joyce University of Idaho A large body of mathematical population genetics was developed by the three main speakers in this symposium. As a tribute to the substantial contributions of Ewens, Griffiths and Tavaré, I will present an over view of some of my work, which builds upon their ideas. The focus will be on issues in the realm of Mathematical Statistics. The likelihood functions are based on the stationary distributions, under both infinite and K-alleles models, involving mutation, selection and genetic drift. The theoretical portion of the talk will consider limiting results that determine under what conditions models can be distinguished based on allele frequency data at a single locus. The computational portion of the talk will focus on new computationally efficient approaches to analyzing data under these models. A brief history of the problem In the late 9 s John Gillespie challenged Tom Kurtz, Steve Krone and myself to come up with a rigorous proof for his conjecture that the heterozygote advantage model converges to the neutral model in the limit as both θ and σ go to infinity at the same rate. Recall that θ =4Nu and the σ =4Ns In and 3 Steve Krone, Tom Kurtz and I published two papers in the Annals of Applied Probability addressing the problem posed by Gillespie. For purely mathematical reasons I decided to consider the homozygote advantage model and developed an analogous result. I got some help from my colleague Frank Gao. Gillespie Joyce, Krone, and Kurtz Heterozygote Advantage Model Notation and Vocabulary Review What does a sample from a neutral population look like? My version of Warren s slide N-Effective Population Size. Fitness of Heterozygote =1, Fitness of Homozygote = w, w<1 u- per individual mutation rate w =1 σ/4n) or σ =1 w)4n. θ =4Nu

2 The Effects of Selection The probability that two individuals chosen at random are the same type is N F = Xi. The heterozygote advantage model penalizes homozygote, thus decreasing F. The minimum value of Recall from Calculus F = N subject to the constraint that N X i =1occurs when X i = 1 N. Selection tends to make the allele frequencies more evenly distributed. It is sometimes referred to as balancing selection. X i σ =4N1 w) θ =4Nu As the population size increases the mutation rate and selection intensity become large. An increase in the mutation parameter θ tends to σ = θ =.3 increase the number of alleles decrease the homozygosity. An increased selection intensity also decreases the homozygosity. Can high mutation mask selection when the population is large? σ = θ =.3 Stationary Distribution under Neutrality Let V 1,V,... be i.i.d, with beta density fx) =θ1 x) θ 1. The joint distribution of the population proportions X =X 1,X,...), underneutrality σ = θ =4 X 1 = V 1, X i =1 V 1 )1 V ) 1 V i 1 )V i 1) X µ σ = θ =4

3 Stationary Distribution Under Selection The stationary distribution under selection depends on the population homozygosity which is given by F = X i. The form of the stationary distribution µ σ follows as a special case of : µ σ A) = A e σf Ee σf ) µ dx), ) dµ = e σf E[e σf ] Samples versus Populations Let A n be a random partition structure of a sample of size of size n then P σ A n = a n ) P A n = a n ) = E dµσ ) X) A n = a n dµ and P σ A n = a n ) lim n P A n = a n ) = X) dµ where P A n = a n ) is the Ewens Sampling Formula. See Joyce 1994) for more details Theorem 4.4 in Ethier and Kurtz 1994) If σ = cθthen lim X) = lim θ dµ θ Gillespie Conjecture: exp{ σ X i }] E [exp{ σ X i }] =1, 3) Joyce Krone and Kurtz 3) Theorem 1 Suppose X =X 1,X,...) µ, Y =Y 1,Y,...) µ σ, where σ = cθ 3/+γ and c> is a constant. F X) = X i and F Y) = Y i Then, as θ, dµ X) = e σfx) E e σfx) ) 1, if γ< expcz c ), if γ =, if γ> where Z N, ). Outline of the proof of Theorem 1. Outline of the proof of Theorem 1. Define Z θ ) θ θ Xi 1, 4) for σ = cθ 3/ rewrite For γ = X θ )= exp{ cz θ} dµ E exp{ cz θ }) as θ.weneedtoshowthat 1. Z θ Z as θ exp{ cz} E exp{ cz}) P X θ )= e σ P dµ E e σ X i X i ) = exp{ cz θ} E exp{ cz θ }) 5). E exp{ cz θ }) E exp{ cz}) as θ

4 Z θ has a heavy right tail E exp{ cz θ }) E exp{ cz}) as θ but E exp{cz θ }) as θ. Homozygote Advantage Model Unscaled Parameters N-Effective Population Size. Fitness of Heterozygote =1, Fitness of Homozygote = w, w>1 u- per individual mutation rate Distribution of Z θ Distribution of Z Scaled Parameters w =1+σ/4N) or σ =w 1)4N. θ =4Nu Joyce and Gao 6) Homozygote Advantage Theorem Let c be a solution to 1 ) 1 /c c e c 1+ 1 /c = 1 6) Joyce and Gao 6) Homozygote Advantage Theorem Suppose X =X 1,X,...) µ,and Y =Y 1,Y,...) µ σ and let σθ) =cθ. Asθ, P X) ecθ 1, if c<c X i dµ E e cθp ) X i, if c c P Y) ecθ dµ E e cθp Y i X i 1, if c<c ), if c c What is c? Recall that θ =4Nu and σ =4Nw 1) where w>1. If σ = cθ then c = σ/θ and c = w 1 u Theorem in words At a highly polymorphic locus θ large) the homozygote advantage model is readily distinguishable from the neutral model provided the selection coefficient w 1 is at least times bigger than the the per individual mutation rate u. However, if the selective advantage is below.4554 times the mutation rate then the models are indistinguishable in the limit.

5 Proof Let V 1,V,... be i.i.d, with beta density θ1 x) θ 1.The joint distribution of the population proportions X =X 1,X,...), aredefinedby If V has beta density θ1 x) θ 1 then ) E e σf )=E e σv e 1 V ) σf E e σv ) cθv ) = E e X 1 = V 1, If F = X i then X i =1 V 1 )1 V ) 1 V i 1 )V i F = V1 +1 V 1 ) F where F and F have the same distribution and F is independent of V 1 E e cθv ) = e cθx θ1 x) θ 1 dx θ θ e cx 1 x)) dx θf c x)) θ dx Finding the Critical c f c x) for Small c f c x) =e cx 1 x) has a local minimum at x = 1 1 /c and a local maximum at x 1 = 1+ 1 /c, provided c is larger than. f c x) for Large c f c x) for Critical c

6 c is the constant that makes f c x 1 )=1where x 1 = 1+ 1 /c is the local maximum. 1 ) 1 /c c e c 1+ 1 /c = 1 7) If c>c then Ee cθf ) > 1 θf cx)) θ dx as θ Large Deviations and the Homozygotes Advantage Feng and Dawson 5) Theorem Varadhan) Assume that {Q ɛ : ɛ>} satisfies the Large Deviation Principle with speed 1/ɛ and rate function I ). LetC b E) denote the set of bounded continuous functions. Then for any φx) in C b E) one has Λ φ =limɛlog E Q ɛ e φx)/ɛ )=sup{φx) Ix)} ɛ x E Large Deviations and the Homozygotes Advantage Feng and Dawson 5) Large Deviations and the Homozygotes Advantage Feng and Dawson 5) For our case E = {x 1,x,, ):x i > and x i =1} ɛ =1/θ [ 1 lim θ θ log E e cθp ] x i =sup x 1 logf c x)) φx) = x i sup x 1 logf c x)) > when c>c sup x E {φx) Ix)} =sup x 1 logf c x)) Conclusion Introduction The models selection versus neutrality) separate when the selection intensity is large relative to the mutation rate. For the heterozygote advantage model σ must be much larger than θ σ cθ 3/+γ ) before the models separate. For the homozygote advantage model σ need only be moderately larger than θ before the models separate. σ c θ where c.45541). The large deviation results provides a rate of convergence when c>c.

7 Introduction Any assessment of the forces that generate and maintain genetic diversity must include the possibility of selection. Computationally intensive methods for approximating likelihood functions and generating samples for a class of nonneutral models was proposed by Donnelly, Nordborg, and Joyce DNJ) 1). Benefit The new methods make likelihood analysis practicable for a wider set of parameters. In particular, if the selection intensity is much greater than the mutation rate, then the DNJ 1) methods become increasingly inefficient. However, this is the case where one has the best hope of drawing meaningful more precise) inferences. We develop algorithms for likelihood analysis that are substantially more efficient than those in DNJ 1). Calculating the constant of integration Law of Large Numbers Simulate many population frequencies X 1, X,...,X M under neutrality and average. That is, See DNJ 1). E N e X ΣX ) M e X iσx i )/M. 8) Calculating the constant of integration Law of Large Numbers Simulate many population X 1, X,...,X M under neutrality and average. That is, E N e X ΣX )) M e X iσx i )/M. 9) This works fine if the selective influences are relatively small. However, when selection is small there is very little power to detect selection from neutrality and likelihood analysis gives little to no information about the parameters of interest. When selection is large enough to be detected, the above method is extremely inefficient. Simulating data under selection Rejection Method 1. Simulate X from the neutral model. Simulate U, an independent uniform random variable on [, 1] 3. If U e σx) σ max,reportx as a population frequency from the nonneutral model. Otherwise return to step 1. See DNJ 1). If σ 1 it takes 1 9 rejections before a sample is accepted. Importance sampling and rejection method The rejection method involves generating random variables under the proposal distribution and then developing a rule for rejecting or accepting the simulated random variable, so that the accepted random variables are distributed according to the target distribution. Importance sampling also involves generating random variables under the proposal distribution and then creating a weighted average, such that the weighted average represents the expectation under the target distribution of a random quantity of interest.

8 A good proposal distribution should have the following two properties A good proposal distribution 1. It should be easy to simulate data and calculate probabilities of interest with respect to the proposal distribution.. The proposal distribution should be in some sense close to the target distribution. In DNJ 1) the neutral model is the proposal distribution and the model with selection is the target distribution. While the neutral model has property 1, it does not have property. A bad proposal distribution Computation of the Normalization Constant when Σ is Diagonal We consider the special case where Σ is a diagonal matrix. Denote the entries of the diagonal by Σ =σ 1,σ,,σ K ). The normalization constant for the distribution can be calculated by a series of recursive integrals. Define α i = θν i 1. cσ,θν) = 1 xα 1 1 e σ 1x 1 x1 1 x α e σ x where g K y) =y α K e σ Ky 1 P K 3 x i x α K K e σ K x K 1 P K x i x α K 1 K 1 e σ K 1x K 1 g K 1 ) K 1 x i dx K 1 dx 1. 1 P K x i = y x α K 1 K 1 1 K x i x K 1 ) αk e σ K 1x K 1 ) K g K 1 x i x K 1 dx K 1 t α K 1 y t) α K e σ K 1t g K y t) dt g K 1 y). where y =1 x 1 x... x K and t = x K 1

9 Integral is iteratively defined cσ,θν) = 1 1 x1 x α 1 1 e σ 1x 1 x α e σ x 1 P K 3 x i x α K K e σ K x K ) K g K 1 1 x i dx K dx 1 Now let y =1 x 1 x... x i 1 and t = x i, α i = θν i 1, the successive integrals can be defined by g i y) = y t α i e σ it g i+1 y t)dt, 1) for i = K 1,K,...,1. The required cσ,θν) is given by g 1 1). Lyme disease sample The following data was collected by Qui et al. 1997) Hereditas 17: 3-16 on B. burgdorferi the cause of Lyme disease) from eastern Long Island, New York. relative frequency frequency The maximum likelihood estimate is ˆθ =5and ˆσ =36.A total of 1 6 repetitions per θ were used in DNJ ) Constant of Integration s m c m 36, 5ν)/ c m 36, 51, 1, 1, 1)/4)) Approximations scaled by 1 7 ). Time complexity for mg i y) values is Om logm)) Likelihood Surface Lyme Disease Data Simulated Data A simulated data set from Xu ), where K =, θ =15and σ =65. The relative allele frequencies x =.9,.814,.146,.87,.45,.46,.131,.185,.578,.59,.139,.167,.169,.183,.34,.91,.159,.1376,.869,.6). 11) The original simulation from Xu ) was performed using the DNJ 1) rejection method simulations were required before the data set was accepted.

10 Likelihood surface simulated data s m c m 67.5, 1.65ν)/ c m 67.5, 1.651, 1,...,1)/) Approximations scaled by 1 9 ) New method for simulating samples under selection Define the following cumulative distribution functions with parameter z as F i ; z) for i =1,,,K where y y F i y; z) = tα i exp σ i t)g i+1 z t)dt y z g i z) 1 y>z Generating allele frequencies under selection 1. Generate U i UNIF[, 1 X 1 X X i 1 ]. Define X i = F 1 i U i ;1 X 1 X X i 1 ). and g i y) is defined by 1). Note that P X i y X i 1,,X 1 )=F i y;1 X 1 X i 1 ). Parametric Bootstrap Lyme Disease Data Mean Standard Deviation ˆθ 5.. ˆσ Simulated Data Mean Standard Deviation ˆθ ˆσ The two tables represent estimates of the mean and standard deviation for the maximum likelihood estimates ˆθ and ˆσ based on the parametric bootstrap procedure. Conclusions Important sampling and rejection method are powerful tools for modern likelihood based statistical analysis. DNJ 1) use this approach for the analysis of a class of nonneutral population genetics models. The efficiency of the above mentioned procedures depends critically on the choice of the proposal distribution. Our method generates data directly under the model with selection and so is much more efficient than the methods described in DNJ 1).

arxiv: v1 [stat.ap] 9 Oct 2009

arxiv: v1 [stat.ap] 9 Oct 2009 The Annals of Applied Statistics 2009, Vol. 3, No. 3, 1147 1162 DOI: 10.1214/09-AOAS237 c Institute of Mathematical Statistics, 2009 MAXIMUM LIKELIHOOD ESTIMATES UNDER K-ALLELE MODELS WITH SELECTION CAN

More information

Stochastic Demography, Coalescents, and Effective Population Size

Stochastic Demography, Coalescents, and Effective Population Size Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Infinitely iterated Brownian motion

Infinitely iterated Brownian motion Mathematics department Uppsala University (Joint work with Nicolas Curien) This talk was given in June 2013, at the Mittag-Leffler Institute in Stockholm, as part of the Symposium in honour of Olav Kallenberg

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

On The Mutation Parameter of Ewens Sampling. Formula

On The Mutation Parameter of Ewens Sampling. Formula On The Mutation Parameter of Ewens Sampling Formula ON THE MUTATION PARAMETER OF EWENS SAMPLING FORMULA BY BENEDICT MIN-OO, B.Sc. a thesis submitted to the department of mathematics & statistics and the

More information

Closed-form sampling formulas for the coalescent with recombination

Closed-form sampling formulas for the coalescent with recombination 0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul

More information

Probability Distribution And Density For Functional Random Variables

Probability Distribution And Density For Functional Random Variables Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

July 31, 2009 / Ben Kedem Symposium

July 31, 2009 / Ben Kedem Symposium ing The s ing The Department of Statistics North Carolina State University July 31, 2009 / Ben Kedem Symposium Outline ing The s 1 2 s 3 4 5 Ben Kedem ing The s Ben has made many contributions to time

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Lecture 18 : Ewens sampling formula

Lecture 18 : Ewens sampling formula Lecture 8 : Ewens sampling formula MATH85K - Spring 00 Lecturer: Sebastien Roch References: [Dur08, Chapter.3]. Previous class In the previous lecture, we introduced Kingman s coalescent as a limit of

More information

Quantitative trait evolution with mutations of large effect

Quantitative trait evolution with mutations of large effect Quantitative trait evolution with mutations of large effect May 1, 2014 Quantitative traits Traits that vary continuously in populations - Mass - Height - Bristle number (approx) Adaption - Low oxygen

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes

More information

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses. Stat 300A Theory of Statistics Homework 7: Solutions Nikos Ignatiadis Due on November 28, 208 Solutions should be complete and concisely written. Please, use a separate sheet or set of sheets for each

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior

The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior Shui Feng McMaster University June 26-30, 2011. The 8th Workshop on Bayesian Nonparametrics, Veracruz, Mexico.

More information

Exact Simulation of Multivariate Itô Diffusions

Exact Simulation of Multivariate Itô Diffusions Exact Simulation of Multivariate Itô Diffusions Jose Blanchet Joint work with Fan Zhang Columbia and Stanford July 7, 2017 Jose Blanchet (Columbia/Stanford) Exact Simulation of Diffusions July 7, 2017

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Testing Algebraic Hypotheses

Testing Algebraic Hypotheses Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 7E Probability and Statistics Spring 6 Instructor : Class Meets : Office Hours : Textbook : İlker Bayram EEB 3 ibayram@itu.edu.tr 3.3 6.3, Wednesday EEB 6.., Monday D. B. Bertsekas, J. N. Tsitsiklis,

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

2014/2015 Smester II ST5224 Final Exam Solution

2014/2015 Smester II ST5224 Final Exam Solution 014/015 Smester II ST54 Final Exam Solution 1 Suppose that (X 1,, X n ) is a random sample from a distribution with probability density function f(x; θ) = e (x θ) I [θ, ) (x) (i) Show that the family of

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

I forgot to mention last time: in the Ito formula for two standard processes, putting

I forgot to mention last time: in the Ito formula for two standard processes, putting I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy

More information

Bridging the Gap between Center and Tail for Multiscale Processes

Bridging the Gap between Center and Tail for Multiscale Processes Bridging the Gap between Center and Tail for Multiscale Processes Matthew R. Morse Department of Mathematics and Statistics Boston University BU-Keio 2016, August 16 Matthew R. Morse (BU) Moderate Deviations

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Joint Iterative Decoding of LDPC Codes and Channels with Memory

Joint Iterative Decoding of LDPC Codes and Channels with Memory Joint Iterative Decoding of LDPC Codes and Channels with Memory Henry D. Pfister and Paul H. Siegel University of California, San Diego 3 rd Int l Symposium on Turbo Codes September 1, 2003 Outline Channels

More information

Weak convergence and large deviation theory

Weak convergence and large deviation theory First Prev Next Go To Go Back Full Screen Close Quit 1 Weak convergence and large deviation theory Large deviation principle Convergence in distribution The Bryc-Varadhan theorem Tightness and Prohorov

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Econ 508B: Lecture 5

Econ 508B: Lecture 5 Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego. Dynamics of the evolving Bolthausen-Sznitman coalescent by Jason Schweinsberg University of California at San Diego Outline of Talk 1. The Moran model and Kingman s coalescent 2. The evolving Kingman s

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

8. Genetic Diversity

8. Genetic Diversity 8. Genetic Diversity Many ways to measure the diversity of a population: For any measure of diversity, we expect an estimate to be: when only one kind of object is present; low when >1 kind of objects

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

LAN property for ergodic jump-diffusion processes with discrete observations

LAN property for ergodic jump-diffusion processes with discrete observations LAN property for ergodic jump-diffusion processes with discrete observations Eulalia Nualart (Universitat Pompeu Fabra, Barcelona) joint work with Arturo Kohatsu-Higa (Ritsumeikan University, Japan) &

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Evolution in a spatial continuum

Evolution in a spatial continuum Evolution in a spatial continuum Drift, draft and structure Alison Etheridge University of Oxford Joint work with Nick Barton (Edinburgh) and Tom Kurtz (Wisconsin) New York, Sept. 2007 p.1 Kingman s Coalescent

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri

Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri University of Bergamo (Italy) ilia.negri@unibg.it SAPS VIII, Le Mans 21-24 March,

More information

On detection of unit roots generalizing the classic Dickey-Fuller approach

On detection of unit roots generalizing the classic Dickey-Fuller approach On detection of unit roots generalizing the classic Dickey-Fuller approach A. Steland Ruhr-Universität Bochum Fakultät für Mathematik Building NA 3/71 D-4478 Bochum, Germany February 18, 25 1 Abstract

More information

General Theory of Large Deviations

General Theory of Large Deviations Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations

More information

Infinitely divisible distributions and the Lévy-Khintchine formula

Infinitely divisible distributions and the Lévy-Khintchine formula Infinitely divisible distributions and the Cornell University May 1, 2015 Some definitions Let X be a real-valued random variable with law µ X. Recall that X is said to be infinitely divisible if for every

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

By Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010

By Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010 PADÉ APPROXIMANTS AND EXACT TWO-LOCUS SAMPLING DISTRIBUTIONS By Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010 For population genetics models with recombination, obtaining

More information

MS 3011 Exercises. December 11, 2013

MS 3011 Exercises. December 11, 2013 MS 3011 Exercises December 11, 2013 The exercises are divided into (A) easy (B) medium and (C) hard. If you are particularly interested I also have some projects at the end which will deepen your understanding

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Tail bound inequalities and empirical likelihood for the mean

Tail bound inequalities and empirical likelihood for the mean Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge The mathematical challenge What is the relative importance of mutation, selection, random drift and population subdivision for standing genetic variation? Evolution in a spatial continuum Al lison Etheridge

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger) 8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

f (1 0.5)/n Z =

f (1 0.5)/n Z = Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Final Examination Statistics 200C. T. Ferguson June 11, 2009 Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean

More information

S6880 #7. Generate Non-uniform Random Number #1

S6880 #7. Generate Non-uniform Random Number #1 S6880 #7 Generate Non-uniform Random Number #1 Outline 1 Inversion Method Inversion Method Examples Application to Discrete Distributions Using Inversion Method 2 Composition Method Composition Method

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

Statistical population genetics

Statistical population genetics Statistical population genetics Lecture 7: Infinite alleles model Xavier Didelot Dept of Statistics, Univ of Oxford didelot@stats.ox.ac.uk Slide 111 of 161 Infinite alleles model We now discuss the effect

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information