Estimating Microbial Diversity. John Bunge Department of Statistical Science Cornell University

Size: px
Start display at page:

Download "Estimating Microbial Diversity. John Bunge Department of Statistical Science Cornell University"

Transcription

1 Estimating Microbial Diversity John Bunge Department of Statistical Science Cornell University 1

2 Thanks to: Amy Willis Fiona Walsh David Mark Welch Colleagues too numerous to mention Bunge, J., Willis, A. and Walsh, F. (2013) Estimating the number of species in microbial diversity studies. Ann. Rev. of Statist. and its Appl. v.1. Forthcoming. 2

3 Statisticians 3

4 Bioinformaticists 4

5 Statistics is not a collection of formulae, nor computer programs, but a conceptual framework, an intellectual stance, a point of view, a theory of knowledge Fundamental idea: distinction between sample and population Classical or frequentist statistics is fundamentally dualistic 5

6 Plato s Republic, VII,7 Behold! human beings living in an underground den, which has a mouth open towards the light and reaching all along the den; here they have been from their childhood [ ] Above and behind them a fire is blazing at a distance, [ ] you will see, if you look, a low wall built along the way, like the screen which marionette players have in front of them, over which they show the puppets. [ ] They see only their own shadows, or the shadows of one another, which the fire throws on the opposite wall of the cave [ ] To them, I said, the truth would be literally nothing but the shadows of the images. 6

7 Old Testament Ecclesiastes 1:15 What is crooked cannot be straightened; what is lacking cannot be counted. New Testament Corinthians 13:12 For now we see through a glass, darkly, but then face to face: now I know in part; but then shall I know even as also I am known. 7

8 The knowledge problem in microbiome studies Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. -Wikipedia DNA extraction bias notwithstanding, metagenomics is the most unrestricted and comprehensive approach. Our ability to interpret these data is always improving, and we stand on a precipice of unprecedented discovery [ ] Microbes are not the only group to benefit from these surveys; viruses exist at 10 times the abundance of microbes [ ]. - Gilbert, 2011 BUT: METAGENOMIC SURVEYS RECOVER ONLY A SMALL FRACTION OF THE EXTANT DIVERSITY. NONETHELESS, MANY METHODS TREAT THE OBSERVED SAMPLE AS THE POPULATION. 8

9 MACHINES The fundamental idea of statistics: Distinction between Population (or universe) and Sample (or data) 9

10 THE SAMPLE IS A SUBSET OF THE POPULATION Population Universe Reality State of nature Truth parameters Sample Finite, random noise error perturbation shock statistics Statistical inference: Extract maximum information from sample in order to draw conclusions about population Inductive not deductive 10

11 Question: In a microbial diversity study, What is the population? Collect 1L 500m depth in ocean From 1L, remove 5ml & exhaustively sequence microbial DNA Cluster sequences into OTUs From OTUs, calculate frequency count data Compute estimate of total species richness Question: Richness of what population? Original 1L of water? Surrounding environment? Entire pelagic microbiome? Definition The population is what would be observed if the operative sampling and analysis protocols were carried out to infinite effort. 11

12 How do we statistically estimate total microbial taxonomic richness? 12

13 Physical DNA sample Next-generation sequencing; Bioinformatic preprocessing Collection of sequences Bioinformatic processing: Alignment, clustering, counting Cluster sequences at some % identity, typically 97% {clusters} = {OTUs} OTU = operational taxonomic unit 13

14 Statistical problem: Estimate total population diversity number of species, classes, taxa, OTUs based on frequency count data Data = # of units observed exactly once in sample (singletons); # observed exactly twice (doubletons); # observed exactly three times;. 14

15 Frequency count data example Microbial ecology Fiona Walsh et al. Data from soil in apple orchards Use of antibiotics on bacterial populations in soil ecosystems Singletons 2x doubletons may be 10x! Goal is to estimate taxonomic richness of community Change with respect to intervention/covariates/metadata freq count freq count Walsh F, Owens S, Duffy B, Smith DP, Frey JE Streptomycin use in apple orchards did not alter the soil bacterial communities 15

16 Apple orchard data - original scale 350 count count frequency Apple orchard data - log scale frequency Issues: High diversity Typical of microbial data Singletons ~ 2x doubletons Data acquisition / bioinformatic issues Spurious singletons? Correct at what stage? Statistical approach? 16

17 Statistical inference from frequency count data STANDARD MODEL C classes/taxa/species in population. Each species independently contributes Poisson-distributed # of representatives to the sample. X 2 Poisson( λ ) ~ 2 X 3 Poisson( λ ) ~ 3 X C Poisson( λ ) ~ C X 1 Poisson( λ ) ~ 1 sample Counts ~ zero-truncated mixed Poisson. 17

18 The mixed-poisson model Species (taxon) i contributes a Poisson-distributed number X i of replicates to the sample i.e., taxon i appears in the sample X i times. Units appear independently in the sample Fundamental problem: heterogeneity, i.e., unequal Poisson means λ i Standard approach: model λ i s as i.i.d. replicates from some mixing distribution F Frequency counts f i are then marginally i.i.d. F- mixed Poisson random variables Zero-truncated since zero counts X i are unobservable 18

19 The mixed-poisson model cont d Mixing distribution F, i.e., distribution of sampling intensities λ, is also called species abundance distribution Probably a misnomer Mathematical treatment (marginalization) implies that each species contribution to the sample is independent and identically distributed Both assumptions are certainly wrong How to account for dependent or differently distributed species counts? Not in standard model. 19

20 Mixing distributions F Parametric, low-dimensional parameter vector None point mass at λ all equal species sizes Gamma (Fisher, 1943) Lognormal Inverse Gaussian, generalized inverse Gaussian (Sichel) Pareto Log-t Stable Finite mixture of exponentials - semiparametric 20

21 Richness estimation under the Poisson model Diversity estimate is then Nˆ F : = # taxa in sample 1 P (0) where P F (0) = F-mixed Poisson probability of 0: P λ F ( 0) e df( λ) = = E e Nˆ F is the Horvitz-Thompson estimator (HTE) and is uniformly minimum variance unbiased (UMVU). Require empirical version of Nˆ F, i.e., require estimate of P F (0) (frequentist version). F F ( Λ ) 21

22 Richness estimation under the Poisson model, cont d Require empirical version of HTE Nˆ : = # taxa in sample 1 P (0), F F = F Estimate θ by ML, using zero-truncated F-mixed Poisson, conditional on # of observed taxa. Final estimator: Nˆ F : = # taxa in sample 1 P (0, ˆ) θ SE via Fisher information CI via (approximation to) profile likelihood F F( λ, θ ) 22

23 CatchAll software or: STAMPS! Developed under NSF grant DEB by JB/LW/SC, in C# & C Implements o finite mixtures of 0 4 exponential components (F) o weighted linear regression procedure o all Chao-type nonparametric procedures o model evaluation/gof/selection/outlier assessment Produces estimates, SEs, & CIs Fast, efficient, platform-independent Excel graphics (VBA) package Summary or copious output (text files) Bunge J, Woodard L, Böhning D, Foster JA, Connolly S, Allen HK. 2012b. Estimating population diversity with CatchAll. Bioinformatics 28:

24 Partial CatchAll summary output for apple orchard data Total Number of Observed Species = 1187 Model Best Parm ThreeMix Model ThreeMix Parm Model 2a ThreeMix Tau Observed Sp Estimated Total Sp SE Lower CB Upper CB GOF0 GOF5 edexp edexp Parm Model 2b edexp TwoMixe Parm Model 2c dexp WLRM UnTransf ThreeMix edexp Parm Max Tau WLRM Max Tau LogTransf

25 350 CatchAll fitted models for apple orchard data Counts Observed Other 3--TwoMixedExp/Tau 23 Other 2--ThreeMixedExp/Tau 262 Other 1--ThreeMixedExp/Tau 118 Best--ThreeMixedExp/Tau Τ = Frequency 25

26 Data-analytic considerations Problem of right cutoff point τ o Typically no parametric model will fit complete frequency count dataset o Too many right outliers highly abundant taxa in sample with large gaps between counts o Nonparametric methods do even worse with outliers, diverging to as outliers are included in data Data-analytic solution: remove large frequency counts for frequencies > some cutoff τ o Chao1: τ = 2 o Chao-type coverage-based nonparametric methods: τ = 10 (arbitrary) o Parametric mixture models: τ selected by goodness-of-fit algorithm o Weighted linear regression model: selected by goodness-of-fit Further problem: model selection and outlier deletion confounded o Computational solution: compute all methods at every τ o Requires optimized code o Use double selection algorithm to select best of the best o Introduces simultaneous inference problem: large number of simultaneous GOF tests. Little theory exists to correct for this. 26

27 Statistical analysis of standard model: The bigger picture Philosophy/ approach Frequentist Bayesian Parametric Maximum likelihood (Bunge et al.) Weighted linear regression (Rocchetti et al. 2011) Objective Bayes (Barger et al.; Quince et al.) Nonparametric Coverage-based (Chao et al.); Zelterman; NPMLE (Böhning et al.)??? (Tardella et al. for capturerecapture) 27

28 Statistical analysis of standard model Chao-type nonparametrics Coverage-based approaches Coverage = proportion of population represented in sample Random variable not parameter Can interpret 1 P F (0) as surrogate for coverage Turing s estimate of P F (0): f 1 n where n = # of individual units in sample Good-Turing estimate of diversity: # of taxa in sample 1 f / n Chao s abundance-based coverage estimators (ACE): Good-Turing + adjustment for heterogeneity 1 Chao, A. & J. Bunge Estimating the number of species in a stochastic abundance model. Biometrics 58:

29 7000 Coverage-based estimators diverge to infinity as large frequency counts are included 6000 Estimated Count 5000 Observed Sp 4000 Est Sp for NonParametric Model Est Sp for TwoMixedExp Model Est Sp for SingleExp Model 3000 Est Sp for ThreeMixedExp Model Est Sp for Poisson Model Est Sp for FourMixedExp Model 2000 Hence coveragebased estimators require τ Tau

30 Statistical analysis of standard model: general nonparametrics Nonparametric maximum likelihood estimation Leave species abundance distribution F unspecified, i.e., F varies across all possible distributions Mathematical implications: F is actually non-identifiable Nevertheless NPMLE is possible in principle. Computational issues: difficult numerical search, highly complex error estimation. Software CAMCR Böhning D, Kuhnert R CAMCR: Computer-Assisted Mixture model analysis for Capture-Recapture count data. AStA Adv. Stat. Anal. 93:

31 Rev. Thomas Bayes The Bayesian paradigm Bayesian statistics: Probabilistic & statistical statements concern degrees of belief Usually parametric: statements concern values of parameters, e.g., species richness. Nonparametric Bayes is possible but complex. Procedure: 1. Investigator first declares existing belief about population value: this is prior distribution 2. Collect sample data 3. Update prior, based on data, to obtain posterior, i.e., final state of knowledge or belief about population.

32 The Bayesian paradigm cont d Bayes Theorem: P ( B A) = P( A B) P( B) P( A) Posterior distribution: P(parameters data) P(data parameters) P(parameters) = likelihood prior Bayesian computation is now fairly well established

33 Bayesian estimation of taxonomic richness based on the standard model Species abundance distribution F is parametric: F depends on a small number of parameters (typically 2-3), called θ Parameter of interest is total richness C Procedure: 1. Establish prior distributions for θ and C 2. Likelihood function is known (based on mixed- Poisson) 3. Run Bayesian machinery 4. Obtain posterior distribution, estimate, credible interval, etc. Quince et al. quasi-noninformative priors; Barger et al. formal objective priors. Active research area in statistics. Quince C, Curtis TP, Sloan WT The rational exploration of microbial diversity. ISME J. 2: ; Barger K, Bunge J Objective Bayesian estimation for the number of species. J. Bayesian Analysis 5:

34 A New Hope Is it possible to estimate taxonomic richness without a species abundance distribution independent species contributions to the sample identically distributed species contributions to the sample? Yes, using ratios of frequency counts. 34

35 breakaway: Estimating taxonomic richness based on ratios of frequency counts j count (j+1)f_(j+1)/f_j (j+1)f_(j+1)/f_j Ratio plot - apple orchard data j Idea: ratios are ~ linear Project line downward to obtain f 0 = # of unobserved species r( j) : = ( j + 1) f j f j+1 = α + βj 35

36 breakaway: Estimating taxonomic richness based on ratios of frequency counts, cont d Some issues: Straight-line fit may go negative! Can be fixed by ad hoc log-transformation (Rocchetti et al.) Broad generalization: represent ratio of frequency counts as ratio of polynomials Deep probabilistic justification; corrects negativity f j+ 1 f j = 2 β0 + β1 j + β2 j + β3 j α j + α j + α j Rocchetti I, Bunge J, Böhning D Population size estimation based upon ratios of recapture probabilities. Ann. Appl. Stat. 5: ; Willis A. and Bunge J. (2013) in prep. 36

37 37 breakaway: Estimating taxonomic richness based on ratios of frequency counts, cont d ################## Smoothed weights ################## The best estimate of total diversity is 1800 with std error 256 The model employed was model_1_1 The function selected was f_{x+1}/f_{x} ~ (beta0+beta1*(x-xbar))/(1+alpha1*(x-xbar)) Coef estimates Coef std errors beta beta alpha

38 38 breakaway: Estimating taxonomic richness based on ratios of frequency counts, cont d Nonlinear regression Heteroscedastic (changing variance) Autocorrelated: f 2 /f 1 is correlated with f 3 /f 2, etc. Collinear: parameter estimates of α s and β s highly correlated unless corrected Multiple significant numerical challenges Statistical questions Model selection degree of numerator and denominator polynomials Error estimation Underlying probability theory: what do these models imply, and what are they implied by?

39 Noise and unreliable low frequency counts Next generation sequencing technology [ ] has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rrna genes in a community. However, [ ] because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. - Quince et al. (2011) 39

40 Methods to address unreliable low frequency counts I. Fix the data at the source! Example: PyroNoise and AmpliconNoise - aim at separately removing 454 sequencing errors and PCR single base errors. (Quince 2011) Direct, non-statistical approach 40

41 Methods to address unreliable low frequency counts II. Lower bounds for total richness (diversity) Good-Turing: # oooooooo sssssss 1 # ssssssssss ttttt ssssss ssss Poisson model-based estimate G-T & Poisson assume equal abundances Chao1 Slightly higher but still downwardly biased 41

42 Methods to address unreliable low frequency counts III. Deleting the high-diversity component of a mixture model Bunge J, Böhning D, Allen H, Foster JA. 2012a. Estimating population diversity with unreliable low frequency counts. In Biocomputing 2012: Proceedings of the Pacific Symposium, pp Hackensack, NJ: World Sci. Publ 42

43 Methods to address unreliable low frequency counts IV. Bayesian approaches Informative or subjective: investigator specifies non-trivial downweighting or rapidly decreasing prior for higher diversity values Specific choice of prior? 43

44 Numerical results from viral phage data: Lower bounds and component deletion Method EstDiv SE LCB UCB Poisson GoodTuring ThreeMixedExp Discounted: TwoMixedExp

45 Some notes on β-diversity Crucial to distinguish between Statistical inference procedures that (attempt to) account for unobserved as well as observed diversity Procedures (computational, graphical, or qualitative) that treat the observed sample as the population. UniFrac, ordination methods, co-inertia. Only the former considered here. Estimation of population parameters, possible hypothesis testing. 45

46 Statistical inference for comparing taxonomic diversity across populations Simplest version: Estimate richness in each population, with associated standard errors and confidence intervals, & compare (e.g., do CI s overlap?) Can be done with existing methods: parametric, nonparametric, Bayesian, etc. Exactly ONE known inferential procedure. Lower bound for # of shared taxa: Ŝ = D + af /2 f + bf /2 f + abf /4f (D 12 = observed # of shared species, f jk = # of species observed j times in sample 1 and k times in sample 2, a and b = constants) Pan HY, Chao A, Foissner W A nonparametric lower bound for the number of species shared by multiple communities. J. Agric. Biol. Environ. Stat. 14:

47 Statistical inference for β-diversity: other scenarios Inference for the Jaccard index, accounting for unobserved species (Chao et al.) Inference for the probability of a draw from one distribution not being observed in k draws from another distribution. (Hampton et al.) Statistical work in this area not extensive very fertile area for research. Chao A, Chazdon RL, Colwell RK, Shen T-J Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361 71; Hampton J, Lladser ME Estimation of distribution overlap of urn models. PLoS ONE 7:e

48 NEVER throw away data when doing statistical inference Not even wrong Richard Feynman 48

49 There is no post hoc statistical fix for Ill-posed research problem Vaguely defined population Statistical model not appropriate for o population description o sample generation process Model must compromise between detailed phenomenological description and parsimony To what extent can we idealize the properties of the system and still obtain satisfactory results? The answer to this question can only be given in the end by experiment. Only the comparison of the answers provided by analysis of our model with the results of the experiment will enable us to judge whether the idealization is legitimate. Andronov (1937) Theory of Oscillators. 49

50 On the sociology of science Fact: Universities have statistics departments! o Cornell: o At least 131 university stat dept s in U.S. random sample of 10: University of California, Berkeley, Division of Biostatistics Princeton University, Program in Statistics and Operations Research Bowling Green State University, Department of Applied Statistics and Operations Research University of Illinois, Urbana-Champaign, Department of Statistics University of South Carolina, Department of Statistics Columbia School of Public Health, Division of Biostatistics Medical College of Georgia, Office of Biostatistics and Bioinformatics Duke University, Institute of Statistics and Decision Sciences Yale University Department of Statistics University of Michigan, Department of Biostatistics Collaboration extremely valuable in both directions (even though academic incentive structure may not immediately reward it) Be persistent: Fall down seven times, get up eight 50

51 CatchAll or STAMPS! V.4 now available; mothur uses v.3 (?) Two programs: basic analysis program + Excel graphics spreadsheet (macros) Windows GUI, Windows command-line.net framework must be installed Mac OS/Linux command-line mono must be installed. Input data file structure: *.csv (comma-separated values) 1,f 1 2,f 2 m,f m 51

52 CatchAll cont d Read in data Go! (Can set option to omit most complex model, if too time-consuming; see manual) Output files appear in Output folder/directory datasetname_analysis.csv Complete listing of all analyses datasetname_bestmodelsanalysis.csv Column formatted summary analysis output datasetname_bestmodelsfits.csv Fitted values for the "best models" as selected by the model selection algorithm datasetname_bubbleplot.csv Data to generate bubble plots using Excel spreadsheet 52

53 CatchAll cont d: BestModelsAnalysis file Total number of observed species: self explanatory Model: see manual Tau: upper frequency cutoff Observed Sp: number of species (counts) with frequencies up to τ only Estimated total Sp: final estimate of the total number of species in the population SE: standard error of preceding estimate Lower CB, Upper CB: lower and upper 95% confidence bounds GOF0, GOF5: Pearson goodness of fit p values, uncorrected and corrected 53

54 CatchAll cont d: BestModelsAnalysis file Best Parm Model; Parm Model 2a, 2b, 2c. Parametric models (and τ s) selected by various goodness of fit criteria WLRM: weighted linear regression model Parm Max Tau, WLRM Max Tau: best parametric model and WLRM computed on entire dataset Best Discounted: best parametric model with low frequency/high diversity component deleted Non P 1: Chao1, nonparametric lower bound for total number of species Non P 2. Chao s ACE or high diversity variant ACE1 (τ 10) Non P 3. Chao s ACE (τ 10) 54

55 CatchAll cont d: Analysis file All models & procedures computed by CatchAll, including several not reported in summary analysis All cutoffs τ All supplementary/supporting information (GOF etc.) Question: what if no best parametric model selected? o Means no model passed most stringent GOF criteria o Revert to alternative models (2a-c) o If necessary revert to lower bounds (Chao1 etc.) 55

How Not to Lie With Statistics: John Bunge Department of Statistical Science Cornell University

How Not to Lie With Statistics: John Bunge Department of Statistical Science Cornell University How Not to Lie With Statistics: Statistical Epistemology or What, In Fact, Are We Studying? John Bunge jab18@cornell.edu Department of Statistical Science Cornell University 1 Thanks to: MBL, Mitch Sogin,

More information

CatchAll Version 2.0 User Operations Manual. by Linda Woodard, Sean Connolly, and John Bunge Cornell University. June 7, 2011

CatchAll Version 2.0 User Operations Manual. by Linda Woodard, Sean Connolly, and John Bunge Cornell University. June 7, 2011 CatchAll Version 2.0 User Operations Manual by Linda Woodard, Sean Connolly, and John Bunge Cornell University June 7, 20 Funded by National Science Foundation grant #086638. System requirements. There

More information

ESTIMATING POPULATION DIVERSITY WITH UNRELIABLE LOW FREQUENCY COUNTS

ESTIMATING POPULATION DIVERSITY WITH UNRELIABLE LOW FREQUENCY COUNTS ESTIMATING POPULATION DIVERSITY WITH UNRELIABLE LOW FREQUENCY COUNTS JOHN BUNGE Department of Statistical Science, Cornell University, Ithaca, NY 14853, USA E-mail: jab18@cornell.edu www.northeastern.edu/catchall

More information

Package breakaway. R topics documented: March 30, 2016

Package breakaway. R topics documented: March 30, 2016 Title Species Richness Estimation and Modeling Version 3.0 Date 2016-03-29 Author and John Bunge Maintainer Package breakaway March 30, 2016 Species richness estimation is an important

More information

Species richness estimation with high diversity but spurious singletons

Species richness estimation with high diversity but spurious singletons Species richness estimation with high diversity but spurious singletons Amy Willis arxiv:604.02598v [stat.me] 9 Apr 206 Informal note from the author The method described in this paper has been available

More information

arxiv: v2 [stat.me] 9 Dec 2014

arxiv: v2 [stat.me] 9 Dec 2014 Estimating Diversity via Frequency Ratios Amy Willis Cornell University, Ithaca, New York, U.S.A. John Bunge Cornell University, Ithaca, New York, U.S.A. arxiv:1408.3333v2 [stat.me] 9 Dec 2014 Summary.

More information

Package SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1.

Package SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1. Package SPECIES April 23, 2011 Type Package Title Statistical package for species richness estimation Version 1.0 Date 2010-01-24 Author Ji-Ping Wang, Maintainer Ji-Ping Wang

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & nformatics Colorado School of Public Health University of Colorado Denver

More information

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Other resources. Greengenes (bacterial)  Silva (bacteria, archaeal and eukarya) General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME

More information

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare: An R package for Bayesian simultaneous quantile regression Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare in an R package to conduct Bayesian quantile regression

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

The Calibrated Bayes Factor for Model Comparison

The Calibrated Bayes Factor for Model Comparison The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016 Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION City of origin as a confounding variable. The original study was designed such that the city where sampling was performed was perfectly confounded with where the DNA extractions and sequencing was performed.

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment.

In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment. 1 Introduction and Problem Formulation In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment. 1.1 Machine Learning under Covariate

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Overall Plan of Simulation and Modeling I. Chapters

Overall Plan of Simulation and Modeling I. Chapters Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2011, Volume 40, Issue 9. http://www.jstatsoft.org/ SPECIES: An R Package for Species Richness Estimation Ji-Ping Wang Northwestern University Abstract We introduce

More information

How to predict the probability of a major nuclear accident after Fukushima Da

How to predict the probability of a major nuclear accident after Fukushima Da How to predict the probability of a major nuclear accident after Fukushima Dai-ichi? CERNA Mines ParisTech March 14, 2012 1 2 Frequentist approach Bayesian approach Issues 3 Allowing safety progress Dealing

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Stat 502X Exam 2 Spring 2014

Stat 502X Exam 2 Spring 2014 Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine September 1, 2008 Abstract This paper

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Bayesian network modeling. 1

Bayesian network modeling.  1 Bayesian network modeling http://springuniversity.bc3research.org/ 1 Probabilistic vs. deterministic modeling approaches Probabilistic Explanatory power (e.g., r 2 ) Explanation why Based on inductive

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scrna-seq de October 2017 1 / 34 Outline Introduction: what

More information

Objective Bayesian Estimation for the Number of Species

Objective Bayesian Estimation for the Number of Species Bayesian Analysis (2010) 5, Number 4, pp. 765 786 Objective Bayesian Estimation for the Number of Species Kathryn Barger and John Bunge Abstract. Objective priors have been used in Bayesian models for

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 17. Bayesian inference; Bayesian regression Training == optimisation (?) Stages of learning & inference: Formulate model Regression

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1 Reading Assignment Distributed Lag and Autoregressive Models Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1 Distributed Lag and Autoregressive Models Distributed lag model: y t = α + β

More information

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros J. Harvey a,b & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics and Actuarial Science

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information