Joyce, Krone, and Kurtz
|
|
- Matilda Hodges
- 6 years ago
- Views:
Transcription
1 Statistical Inference for Population Genetics Models by Paul Joyce University of Idaho A large body of mathematical population genetics was developed by the three main speakers in this symposium. As a tribute to the substantial contributions of Ewens, Griffiths and Tavaré, I will present an over view of some of my work, which builds upon their ideas. The focus will be on issues in the realm of Mathematical Statistics. The likelihood functions are based on the stationary distributions, under both infinite and K-alleles models, involving mutation, selection and genetic drift. The theoretical portion of the talk will consider limiting results that determine under what conditions models can be distinguished based on allele frequency data at a single locus. The computational portion of the talk will focus on new computationally efficient approaches to analyzing data under these models. A brief history of the problem In the late 9 s John Gillespie challenged Tom Kurtz, Steve Krone and myself to come up with a rigorous proof for his conjecture that the heterozygote advantage model converges to the neutral model in the limit as both θ and σ go to infinity at the same rate. Recall that θ =4Nu and the σ =4Ns In and 3 Steve Krone, Tom Kurtz and I published two papers in the Annals of Applied Probability addressing the problem posed by Gillespie. For purely mathematical reasons I decided to consider the homozygote advantage model and developed an analogous result. I got some help from my colleague Frank Gao. Gillespie Joyce, Krone, and Kurtz Heterozygote Advantage Model Notation and Vocabulary Review What does a sample from a neutral population look like? My version of Warren s slide N-Effective Population Size. Fitness of Heterozygote =1, Fitness of Homozygote = w, w<1 u- per individual mutation rate w =1 σ/4n) or σ =1 w)4n. θ =4Nu
2 The Effects of Selection The probability that two individuals chosen at random are the same type is N F = Xi. The heterozygote advantage model penalizes homozygote, thus decreasing F. The minimum value of Recall from Calculus F = N subject to the constraint that N X i =1occurs when X i = 1 N. Selection tends to make the allele frequencies more evenly distributed. It is sometimes referred to as balancing selection. X i σ =4N1 w) θ =4Nu As the population size increases the mutation rate and selection intensity become large. An increase in the mutation parameter θ tends to σ = θ =.3 increase the number of alleles decrease the homozygosity. An increased selection intensity also decreases the homozygosity. Can high mutation mask selection when the population is large? σ = θ =.3 Stationary Distribution under Neutrality Let V 1,V,... be i.i.d, with beta density fx) =θ1 x) θ 1. The joint distribution of the population proportions X =X 1,X,...), underneutrality σ = θ =4 X 1 = V 1, X i =1 V 1 )1 V ) 1 V i 1 )V i 1) X µ σ = θ =4
3 Stationary Distribution Under Selection The stationary distribution under selection depends on the population homozygosity which is given by F = X i. The form of the stationary distribution µ σ follows as a special case of : µ σ A) = A e σf Ee σf ) µ dx), ) dµ = e σf E[e σf ] Samples versus Populations Let A n be a random partition structure of a sample of size of size n then P σ A n = a n ) P A n = a n ) = E dµσ ) X) A n = a n dµ and P σ A n = a n ) lim n P A n = a n ) = X) dµ where P A n = a n ) is the Ewens Sampling Formula. See Joyce 1994) for more details Theorem 4.4 in Ethier and Kurtz 1994) If σ = cθthen lim X) = lim θ dµ θ Gillespie Conjecture: exp{ σ X i }] E [exp{ σ X i }] =1, 3) Joyce Krone and Kurtz 3) Theorem 1 Suppose X =X 1,X,...) µ, Y =Y 1,Y,...) µ σ, where σ = cθ 3/+γ and c> is a constant. F X) = X i and F Y) = Y i Then, as θ, dµ X) = e σfx) E e σfx) ) 1, if γ< expcz c ), if γ =, if γ> where Z N, ). Outline of the proof of Theorem 1. Outline of the proof of Theorem 1. Define Z θ ) θ θ Xi 1, 4) for σ = cθ 3/ rewrite For γ = X θ )= exp{ cz θ} dµ E exp{ cz θ }) as θ.weneedtoshowthat 1. Z θ Z as θ exp{ cz} E exp{ cz}) P X θ )= e σ P dµ E e σ X i X i ) = exp{ cz θ} E exp{ cz θ }) 5). E exp{ cz θ }) E exp{ cz}) as θ
4 Z θ has a heavy right tail E exp{ cz θ }) E exp{ cz}) as θ but E exp{cz θ }) as θ. Homozygote Advantage Model Unscaled Parameters N-Effective Population Size. Fitness of Heterozygote =1, Fitness of Homozygote = w, w>1 u- per individual mutation rate Distribution of Z θ Distribution of Z Scaled Parameters w =1+σ/4N) or σ =w 1)4N. θ =4Nu Joyce and Gao 6) Homozygote Advantage Theorem Let c be a solution to 1 ) 1 /c c e c 1+ 1 /c = 1 6) Joyce and Gao 6) Homozygote Advantage Theorem Suppose X =X 1,X,...) µ,and Y =Y 1,Y,...) µ σ and let σθ) =cθ. Asθ, P X) ecθ 1, if c<c X i dµ E e cθp ) X i, if c c P Y) ecθ dµ E e cθp Y i X i 1, if c<c ), if c c What is c? Recall that θ =4Nu and σ =4Nw 1) where w>1. If σ = cθ then c = σ/θ and c = w 1 u Theorem in words At a highly polymorphic locus θ large) the homozygote advantage model is readily distinguishable from the neutral model provided the selection coefficient w 1 is at least times bigger than the the per individual mutation rate u. However, if the selective advantage is below.4554 times the mutation rate then the models are indistinguishable in the limit.
5 Proof Let V 1,V,... be i.i.d, with beta density θ1 x) θ 1.The joint distribution of the population proportions X =X 1,X,...), aredefinedby If V has beta density θ1 x) θ 1 then ) E e σf )=E e σv e 1 V ) σf E e σv ) cθv ) = E e X 1 = V 1, If F = X i then X i =1 V 1 )1 V ) 1 V i 1 )V i F = V1 +1 V 1 ) F where F and F have the same distribution and F is independent of V 1 E e cθv ) = e cθx θ1 x) θ 1 dx θ θ e cx 1 x)) dx θf c x)) θ dx Finding the Critical c f c x) for Small c f c x) =e cx 1 x) has a local minimum at x = 1 1 /c and a local maximum at x 1 = 1+ 1 /c, provided c is larger than. f c x) for Large c f c x) for Critical c
6 c is the constant that makes f c x 1 )=1where x 1 = 1+ 1 /c is the local maximum. 1 ) 1 /c c e c 1+ 1 /c = 1 7) If c>c then Ee cθf ) > 1 θf cx)) θ dx as θ Large Deviations and the Homozygotes Advantage Feng and Dawson 5) Theorem Varadhan) Assume that {Q ɛ : ɛ>} satisfies the Large Deviation Principle with speed 1/ɛ and rate function I ). LetC b E) denote the set of bounded continuous functions. Then for any φx) in C b E) one has Λ φ =limɛlog E Q ɛ e φx)/ɛ )=sup{φx) Ix)} ɛ x E Large Deviations and the Homozygotes Advantage Feng and Dawson 5) Large Deviations and the Homozygotes Advantage Feng and Dawson 5) For our case E = {x 1,x,, ):x i > and x i =1} ɛ =1/θ [ 1 lim θ θ log E e cθp ] x i =sup x 1 logf c x)) φx) = x i sup x 1 logf c x)) > when c>c sup x E {φx) Ix)} =sup x 1 logf c x)) Conclusion Introduction The models selection versus neutrality) separate when the selection intensity is large relative to the mutation rate. For the heterozygote advantage model σ must be much larger than θ σ cθ 3/+γ ) before the models separate. For the homozygote advantage model σ need only be moderately larger than θ before the models separate. σ c θ where c.45541). The large deviation results provides a rate of convergence when c>c.
7 Introduction Any assessment of the forces that generate and maintain genetic diversity must include the possibility of selection. Computationally intensive methods for approximating likelihood functions and generating samples for a class of nonneutral models was proposed by Donnelly, Nordborg, and Joyce DNJ) 1). Benefit The new methods make likelihood analysis practicable for a wider set of parameters. In particular, if the selection intensity is much greater than the mutation rate, then the DNJ 1) methods become increasingly inefficient. However, this is the case where one has the best hope of drawing meaningful more precise) inferences. We develop algorithms for likelihood analysis that are substantially more efficient than those in DNJ 1). Calculating the constant of integration Law of Large Numbers Simulate many population frequencies X 1, X,...,X M under neutrality and average. That is, See DNJ 1). E N e X ΣX ) M e X iσx i )/M. 8) Calculating the constant of integration Law of Large Numbers Simulate many population X 1, X,...,X M under neutrality and average. That is, E N e X ΣX )) M e X iσx i )/M. 9) This works fine if the selective influences are relatively small. However, when selection is small there is very little power to detect selection from neutrality and likelihood analysis gives little to no information about the parameters of interest. When selection is large enough to be detected, the above method is extremely inefficient. Simulating data under selection Rejection Method 1. Simulate X from the neutral model. Simulate U, an independent uniform random variable on [, 1] 3. If U e σx) σ max,reportx as a population frequency from the nonneutral model. Otherwise return to step 1. See DNJ 1). If σ 1 it takes 1 9 rejections before a sample is accepted. Importance sampling and rejection method The rejection method involves generating random variables under the proposal distribution and then developing a rule for rejecting or accepting the simulated random variable, so that the accepted random variables are distributed according to the target distribution. Importance sampling also involves generating random variables under the proposal distribution and then creating a weighted average, such that the weighted average represents the expectation under the target distribution of a random quantity of interest.
8 A good proposal distribution should have the following two properties A good proposal distribution 1. It should be easy to simulate data and calculate probabilities of interest with respect to the proposal distribution.. The proposal distribution should be in some sense close to the target distribution. In DNJ 1) the neutral model is the proposal distribution and the model with selection is the target distribution. While the neutral model has property 1, it does not have property. A bad proposal distribution Computation of the Normalization Constant when Σ is Diagonal We consider the special case where Σ is a diagonal matrix. Denote the entries of the diagonal by Σ =σ 1,σ,,σ K ). The normalization constant for the distribution can be calculated by a series of recursive integrals. Define α i = θν i 1. cσ,θν) = 1 xα 1 1 e σ 1x 1 x1 1 x α e σ x where g K y) =y α K e σ Ky 1 P K 3 x i x α K K e σ K x K 1 P K x i x α K 1 K 1 e σ K 1x K 1 g K 1 ) K 1 x i dx K 1 dx 1. 1 P K x i = y x α K 1 K 1 1 K x i x K 1 ) αk e σ K 1x K 1 ) K g K 1 x i x K 1 dx K 1 t α K 1 y t) α K e σ K 1t g K y t) dt g K 1 y). where y =1 x 1 x... x K and t = x K 1
9 Integral is iteratively defined cσ,θν) = 1 1 x1 x α 1 1 e σ 1x 1 x α e σ x 1 P K 3 x i x α K K e σ K x K ) K g K 1 1 x i dx K dx 1 Now let y =1 x 1 x... x i 1 and t = x i, α i = θν i 1, the successive integrals can be defined by g i y) = y t α i e σ it g i+1 y t)dt, 1) for i = K 1,K,...,1. The required cσ,θν) is given by g 1 1). Lyme disease sample The following data was collected by Qui et al. 1997) Hereditas 17: 3-16 on B. burgdorferi the cause of Lyme disease) from eastern Long Island, New York. relative frequency frequency The maximum likelihood estimate is ˆθ =5and ˆσ =36.A total of 1 6 repetitions per θ were used in DNJ ) Constant of Integration s m c m 36, 5ν)/ c m 36, 51, 1, 1, 1)/4)) Approximations scaled by 1 7 ). Time complexity for mg i y) values is Om logm)) Likelihood Surface Lyme Disease Data Simulated Data A simulated data set from Xu ), where K =, θ =15and σ =65. The relative allele frequencies x =.9,.814,.146,.87,.45,.46,.131,.185,.578,.59,.139,.167,.169,.183,.34,.91,.159,.1376,.869,.6). 11) The original simulation from Xu ) was performed using the DNJ 1) rejection method simulations were required before the data set was accepted.
10 Likelihood surface simulated data s m c m 67.5, 1.65ν)/ c m 67.5, 1.651, 1,...,1)/) Approximations scaled by 1 9 ) New method for simulating samples under selection Define the following cumulative distribution functions with parameter z as F i ; z) for i =1,,,K where y y F i y; z) = tα i exp σ i t)g i+1 z t)dt y z g i z) 1 y>z Generating allele frequencies under selection 1. Generate U i UNIF[, 1 X 1 X X i 1 ]. Define X i = F 1 i U i ;1 X 1 X X i 1 ). and g i y) is defined by 1). Note that P X i y X i 1,,X 1 )=F i y;1 X 1 X i 1 ). Parametric Bootstrap Lyme Disease Data Mean Standard Deviation ˆθ 5.. ˆσ Simulated Data Mean Standard Deviation ˆθ ˆσ The two tables represent estimates of the mean and standard deviation for the maximum likelihood estimates ˆθ and ˆσ based on the parametric bootstrap procedure. Conclusions Important sampling and rejection method are powerful tools for modern likelihood based statistical analysis. DNJ 1) use this approach for the analysis of a class of nonneutral population genetics models. The efficiency of the above mentioned procedures depends critically on the choice of the proposal distribution. Our method generates data directly under the model with selection and so is much more efficient than the methods described in DNJ 1).
arxiv: v1 [stat.ap] 9 Oct 2009
The Annals of Applied Statistics 2009, Vol. 3, No. 3, 1147 1162 DOI: 10.1214/09-AOAS237 c Institute of Mathematical Statistics, 2009 MAXIMUM LIKELIHOOD ESTIMATES UNDER K-ALLELE MODELS WITH SELECTION CAN
More informationStochastic Demography, Coalescents, and Effective Population Size
Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationInfinitely iterated Brownian motion
Mathematics department Uppsala University (Joint work with Nicolas Curien) This talk was given in June 2013, at the Mittag-Leffler Institute in Stockholm, as part of the Symposium in honour of Olav Kallenberg
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationOn The Mutation Parameter of Ewens Sampling. Formula
On The Mutation Parameter of Ewens Sampling Formula ON THE MUTATION PARAMETER OF EWENS SAMPLING FORMULA BY BENEDICT MIN-OO, B.Sc. a thesis submitted to the department of mathematics & statistics and the
More informationClosed-form sampling formulas for the coalescent with recombination
0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul
More informationProbability Distribution And Density For Functional Random Variables
Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationHypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationJuly 31, 2009 / Ben Kedem Symposium
ing The s ing The Department of Statistics North Carolina State University July 31, 2009 / Ben Kedem Symposium Outline ing The s 1 2 s 3 4 5 Ben Kedem ing The s Ben has made many contributions to time
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationLecture 18 : Ewens sampling formula
Lecture 8 : Ewens sampling formula MATH85K - Spring 00 Lecturer: Sebastien Roch References: [Dur08, Chapter.3]. Previous class In the previous lecture, we introduced Kingman s coalescent as a limit of
More informationQuantitative trait evolution with mutations of large effect
Quantitative trait evolution with mutations of large effect May 1, 2014 Quantitative traits Traits that vary continuously in populations - Mass - Height - Bristle number (approx) Adaption - Low oxygen
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationDirection: This test is worth 250 points and each problem worth points. DO ANY SIX
Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes
More informationHomework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.
Stat 300A Theory of Statistics Homework 7: Solutions Nikos Ignatiadis Due on November 28, 208 Solutions should be complete and concisely written. Please, use a separate sheet or set of sheets for each
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationThe Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior
The Poisson-Dirichlet Distribution: Constructions, Stochastic Dynamics and Asymptotic Behavior Shui Feng McMaster University June 26-30, 2011. The 8th Workshop on Bayesian Nonparametrics, Veracruz, Mexico.
More informationExact Simulation of Multivariate Itô Diffusions
Exact Simulation of Multivariate Itô Diffusions Jose Blanchet Joint work with Fan Zhang Columbia and Stanford July 7, 2017 Jose Blanchet (Columbia/Stanford) Exact Simulation of Diffusions July 7, 2017
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationTesting Algebraic Hypotheses
Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:
More informationStat 451 Lecture Notes Simulating Random Variables
Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationMAT 271E Probability and Statistics
MAT 7E Probability and Statistics Spring 6 Instructor : Class Meets : Office Hours : Textbook : İlker Bayram EEB 3 ibayram@itu.edu.tr 3.3 6.3, Wednesday EEB 6.., Monday D. B. Bertsekas, J. N. Tsitsiklis,
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationHigh Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More information2014/2015 Smester II ST5224 Final Exam Solution
014/015 Smester II ST54 Final Exam Solution 1 Suppose that (X 1,, X n ) is a random sample from a distribution with probability density function f(x; θ) = e (x θ) I [θ, ) (x) (i) Show that the family of
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationTutorial on Approximate Bayesian Computation
Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016
More informationI forgot to mention last time: in the Ito formula for two standard processes, putting
I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy
More informationBridging the Gap between Center and Tail for Multiscale Processes
Bridging the Gap between Center and Tail for Multiscale Processes Matthew R. Morse Department of Mathematics and Statistics Boston University BU-Keio 2016, August 16 Matthew R. Morse (BU) Moderate Deviations
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationJoint Iterative Decoding of LDPC Codes and Channels with Memory
Joint Iterative Decoding of LDPC Codes and Channels with Memory Henry D. Pfister and Paul H. Siegel University of California, San Diego 3 rd Int l Symposium on Turbo Codes September 1, 2003 Outline Channels
More informationWeak convergence and large deviation theory
First Prev Next Go To Go Back Full Screen Close Quit 1 Weak convergence and large deviation theory Large deviation principle Convergence in distribution The Bryc-Varadhan theorem Tightness and Prohorov
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationSolutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele
More informationDynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.
Dynamics of the evolving Bolthausen-Sznitman coalescent by Jason Schweinsberg University of California at San Diego Outline of Talk 1. The Moran model and Kingman s coalescent 2. The evolving Kingman s
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More information8. Genetic Diversity
8. Genetic Diversity Many ways to measure the diversity of a population: For any measure of diversity, we expect an estimate to be: when only one kind of object is present; low when >1 kind of objects
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationCALCULUS JIA-MING (FRANK) LIOU
CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion
More informationGaussian, Markov and stationary processes
Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November
More informationProblem 1 (20) Log-normal. f(x) Cauchy
ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationMath 152. Rumbos Fall Solutions to Assignment #12
Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus
More informationLAN property for ergodic jump-diffusion processes with discrete observations
LAN property for ergodic jump-diffusion processes with discrete observations Eulalia Nualart (Universitat Pompeu Fabra, Barcelona) joint work with Arturo Kohatsu-Higa (Ritsumeikan University, Japan) &
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationEvolution in a spatial continuum
Evolution in a spatial continuum Drift, draft and structure Alison Etheridge University of Oxford Joint work with Nick Barton (Edinburgh) and Tom Kurtz (Wisconsin) New York, Sept. 2007 p.1 Kingman s Coalescent
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationAsymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri
Asymptotical distribution free test for parameter change in a diffusion model (joint work with Y. Nishiyama) Ilia Negri University of Bergamo (Italy) ilia.negri@unibg.it SAPS VIII, Le Mans 21-24 March,
More informationOn detection of unit roots generalizing the classic Dickey-Fuller approach
On detection of unit roots generalizing the classic Dickey-Fuller approach A. Steland Ruhr-Universität Bochum Fakultät für Mathematik Building NA 3/71 D-4478 Bochum, Germany February 18, 25 1 Abstract
More informationGeneral Theory of Large Deviations
Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations
More informationInfinitely divisible distributions and the Lévy-Khintchine formula
Infinitely divisible distributions and the Cornell University May 1, 2015 Some definitions Let X be a real-valued random variable with law µ X. Recall that X is said to be infinitely divisible if for every
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationHypothesis testing: theory and methods
Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationBy Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010
PADÉ APPROXIMANTS AND EXACT TWO-LOCUS SAMPLING DISTRIBUTIONS By Paul A. Jenkins and Yun S. Song, University of California, Berkeley July 26, 2010 For population genetics models with recombination, obtaining
More informationMS 3011 Exercises. December 11, 2013
MS 3011 Exercises December 11, 2013 The exercises are divided into (A) easy (B) medium and (C) hard. If you are particularly interested I also have some projects at the end which will deepen your understanding
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationTail bound inequalities and empirical likelihood for the mean
Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationLecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary
ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood
More informationThe mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge
The mathematical challenge What is the relative importance of mutation, selection, random drift and population subdivision for standing genetic variation? Evolution in a spatial continuum Al lison Etheridge
More informationStatistical Inference of Covariate-Adjusted Randomized Experiments
1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu
More informationLecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)
8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationf (1 0.5)/n Z =
Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationFinal Examination Statistics 200C. T. Ferguson June 11, 2009
Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean
More informationS6880 #7. Generate Non-uniform Random Number #1
S6880 #7 Generate Non-uniform Random Number #1 Outline 1 Inversion Method Inversion Method Examples Application to Discrete Distributions Using Inversion Method 2 Composition Method Composition Method
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the
More informationStatistical population genetics
Statistical population genetics Lecture 7: Infinite alleles model Xavier Didelot Dept of Statistics, Univ of Oxford didelot@stats.ox.ac.uk Slide 111 of 161 Infinite alleles model We now discuss the effect
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More information