One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

Size: px
Start display at page:

Download "One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays"

Transcription

1 One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning of Poisson Distributions p.1/27

2 cdna array analysis Biologists - analyze patterns of expression levels of selected genes in different tissues possibly obtained under different conditions or treatment regimes. Measurement of gene expression levels: via hybridization to microarrays by counting gene tags (signatures) using e.g. Serial Analysis of Gene Expression (SAGE) or Massively Parallel Signature Sequencing (MPSS) methodologies One-shot Learning of Poisson Distributions p.2/27

3 SAGE SAGE procedure results in a library of short sequence tags, each representing an expressed gene. Key assumption: every mrna copy in the tissue has the same chance of ending up as a tag in the library. Selecting a specific tag from the pool of transcripts can be approximately considered as sampling with replacement Key step in many SAGE studies: identification of interesting genes typically those that are differentially expressed under different conditions/treatments. Compare the number of specific tags found in two SAGE libraries corresponding to different conditions or treatments. One-shot Learning of Poisson Distributions p.3/27

4 The approach of Audic and Claverie Audic and Claverie among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data. a popular approach in current biological research: 427 citations (ISI Web of Knowledge), over 100 citations in the past 3 years. Typically, cdna libraries contain a large number of different expressed genes and observing a given cdna qualifies as a rare event. One-shot Learning of Poisson Distributions p.4/27

5 A-C approach Consider a transcript representing a small fraction of the library and a large number N of clones. The probability of observing x tags of the same gene is well-approximated by the Poisson distribution parametrized by λ 0: P(X = x λ) = e λλx x!. The unknown parameter λ signifies the number of transcripts of the given type (tag) per N clones in the cdna library. One-shot Learning of Poisson Distributions p.5/27

6 A-C approach - cont d Null hypothesis of not differentially expressed genes: the tag count x in one library comes from the same underlying Poisson distribution P( λ) as the tag count y in the other library. Each SAGE library represents a single (count) measurement only! From a purely statistical standpoint, resolving this issue is potentially quite problematic... Key instrument of the A-C approach: distribution P(y x) over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. One-shot Learning of Poisson Distributions p.6/27

7 So what do we really want to do? press once press once λ 1 λ 2 x y Are you crazy?? λ = 1 λ 2 One-shot Learning of Poisson Distributions p.7/27

8 P(y x) P(y x) = = = p(y,λ x) dλ P(y λ,x) p(λ x) dλ P(y λ) P(x λ) p(λ) 0 P(x λ ) p(λ ) dλ dλ. Imposing flat prior p(λ) over the Poisson parameter λ results in P(y x) = 1 y! 0 e 2λ λ x+y dλ 0 e λ λ x dλ. One-shot Learning of Poisson Distributions p.8/27

9 A-C statistic Since Gamma distribution parametrized by a, b > 0 takes the form Gamma(λ a,b) = 1 Γ(a) ba λ a 1 e bλ, where Γ(a) = 0 u a 1 e u du is the Gamma function, we have P(y x) = 1 Γ(x + y + 1) y! 2 x+y+1, Γ(x + 1) which, since x and y are integers (i.e. Γ(x) = (x 1)!), can be rewritten as P(y x) = 1 (x + y)! 2 x+y+1 x! y! = 1 2 x+y+1 ( x + y x ). One-shot Learning of Poisson Distributions p.9/27

10 A-C statistic - cont d P(y x) is symmetric, i.e. for x,y 0, P(y x) = P(x y). a desirable property: since if the counts x,y are related to two libraries of the same size, they should be interchangeable when analyzing whether they come from the same underlying process or not. A-C statistic can be used e.g. for principled inferences, construction of confidence intervals, statistical testing etc. we ask: 1. How natural is the A-C statistic s representation of the underlying unknown Poisson distribution governing the tag counts? 2. Given that the observed tag count sample is very limited, how well can the Audic-Claverie approach work, i.e. how well does the A-C statistic capture the underlying Poisson distribution? One-shot Learning of Poisson Distributions p.10/27

11 Poisson distribution vs A-C statistic x=10 x= p(yix) Poisson(y x) 0.12 p(yix) Poisson(y x) y y Graphs of A-C statistic P(y x) (solid line) and the corresponding Poisson distribution P(y λ) at λ = x (dashed line) for x = 10 (a) and x = 30 (b). One-shot Learning of Poisson Distributions p.11/27

12 A-C statistic - mode structure The A-C statistic and the underlying Poisson distribution are quite similar in their nature. For any (integer) mean tag count λ 1, the Poisson distribution P( λ) has two neighboring modes located at λ and λ 1, with P(λ λ) = P(λ 1 λ). After observing a count x, the A-C statistic expects counts y = x and y = x 1 with the highest and equal probability. The other values of count y are less probable. One-shot Learning of Poisson Distributions p.12/27

13 A-C statistic - mode structure cont d Theorem 1 Let x,y and d be integers with ranges specified below. It holds: 1. P(x x) > P(x + d x) for any x 0 and d For x 1, P(x x) = P(x 1 x). 3. P(x x) > P(x d x) for any x 2 and 2 d x. One-shot Learning of Poisson Distributions p.13/27

14 A-C statistic - mean and variance Theorem 2 Consider a non-negative integer x and the associated A-C statistic P(y x). Then it holds: 1. E P(y x) [y] = x V ar P(y x) [y] = E P(y x) [(y E P(y x) [y]) 2 ] = 2 E P(y x) [y]. Note that for Poisson distribution: E P(y x) [y] = V ar P(y x) [y] = x. The larger variance of P( x) is the result of Bayesian averaging of Poisson distributions under a flat prior. One-shot Learning of Poisson Distributions p.14/27

15 A-C statistic - Information Theory Assume that there is some true underlying Poisson distribution P(y λ) over possible counts y 0 with unknown mean λ. In the same process, we first generate a count x and then use the A-C statistic P(y x) to define a distribution over y, given the already observed count x. We ask: How different, in terms of Kullback-Leibler (K-L) divergence, are the two distributions P(y x) and P(y λ) over y? For the A-C statistic to work, one would like P(y x) to be sufficiently representative of the true unknown distribution P(y λ). One-shot Learning of Poisson Distributions p.15/27

16 Thought experiment 1 The environment P(. λ) press once Let s see how close you can get λ What do you mean by "close"? x P(. x ) "similarity" between P(. λ) and P(. x) One-shot Learning of Poisson Distributions p.16/27

17 K-L divergence from P(y λ) to P(y x) D(λ,x) = D KL [P(y λ) P(y x)] = y=0 P(y λ) log P(y λ) P(y x). We have D(λ,x) = H[P(y λ)] + log x! + (λ + x + 1)log 2 + F(λ,0) F(λ,x), where for each integer d 0, F(λ,d) = E P(y λ) [log(y + d)!] = P(y λ) log(y + d)! y=0 and H[P(y λ)] is the entropy of P(y λ). One-shot Learning of Poisson Distributions p.17/27

18 Minimum of D(λ, x) One might intuitively expect D(λ,x) to be minimal at x = λ. The conditioning count in the A-C statistic would be the mean of the underlying Poisson distribution. However, the mode of that Poisson distribution, λ 1, is surrounded by enough probability mass to yield: Theorem 3 For any integer λ 1, it holds D(λ, λ) > D(λ, λ 1). In other words, D KL [P(y λ) P(y λ)] > D KL [P(y λ) P(y λ 1)]. One-shot Learning of Poisson Distributions p.18/27

19 Thought experiment 2 The environment P(. λ) press many times Repeat this many times... λ... and see how close I get on average x P(. x ) "similarity" between P(. λ) and P(. x) One-shot Learning of Poisson Distributions p.19/27

20 Expectation of D(λ, x) under sampling of x Given an underlying Poisson distribution P(x λ), if we repeatedly generated a representative count x from P(x λ), what would be the average divergence of the corresponding A-C statistic P(y x) from the truth P(y λ)? We are interested in the quantity E(λ) = E P(x λ) [D(λ,x)]. (1) Up to terms of order O(λ 1 ), the expected divergence of A-C statistic P(y x)] from the true underlying Poisson distribution P(y λ) is equal to (1/2)log 2. One-shot Learning of Poisson Distributions p.20/27

21 Expectation of D(λ, x) Theorem 4 Consider an underlying Poisson distribution P( λ) parametrized by some λ > 0. Then ) E(λ) = E P(x λ) [D KL [P(y λ) P(y x)]] = 1 2 log 2 + O ( 1 λ Sketch of proof: E(λ) = λ(log λ log e + 2log 2) + log 2 + F(λ,0) E P(x λ) [F(λ,x)] E P(x λ) [F(λ,x)] = F(2λ,0) F(λ,0) = λ(log λ log e) log(2πeλ) + O(λ 1 ) One-shot Learning of Poisson Distributions p.21/27

22 Higher order expansion of E(λ) Entropy expansion: H[P(y λ)] = 1 2 log 2(2πeλ) 1 12λ 1 24λ 2 + O(λ 3 ) Expected divergence measured in bits: E(λ) = λ ( 1 1 ) λ 2 ( ) + O(λ 3 ) One-shot Learning of Poisson Distributions p.22/27

23 Analytical approximation of E(λ) E[D KL] numer anal lambda One-shot Learning of Poisson Distributions p.23/27

24 Discussion The Audic-Claverie method a popular approach for detection of differentially expressed genes in the SAGE framework. Main assumption under the null hypothesis the tag counts x,y in two libraries come from the same but unknown Poisson distribution P( λ). The problem each SAGE library represents only a single measurement. Poisson distribution is rather rigid : it is unimodal and parametrized by a single parameter λ representing both its mean and variance. Learning about P( λ) from a very limited sample (as one is effectively bound to do in the SAGE framework) is much less suspicious than one might naively expect. One-shot Learning of Poisson Distributions p.24/27

25 Discussion - cont d We analyzed how close is the A-C statistic P( λ) (in terms of K-L divergence) to the underlying Poisson distribution P( λ) of tag counts. On average, the A-C statistic is never too far from the true underlying distribution. Up to terms of order O(λ 3 ), on average, the A-C statistic is never further away from the truth P( λ) than half-a-bit of additional information. Hence, the Audic-Claverie method can be expected to work well even though the SAGE libraries represent very sparse samples. One-shot Learning of Poisson Distributions p.25/27

26 Discussion - cont d So far the Audic-Claverie methodology has been verified only empirically through a series of specific Monte Carlo simulations. It has not been clear how general the apparently stable simulation findings were. The A-C statistic is universally applicable in any situation where inferences about the underlying Poisson distribution must be made based on an extremely sparse sample. In the Monte Carlo simulations the false alarm rate was small for genes associated with small tag counts and gradually increased for higher tag counts. These findings are consistent with our theoretically calculated divergence function E(λ). One-shot Learning of Poisson Distributions p.26/27

27 Thank you! Further reading, full proofs etc.: P. Tiňo: Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays. BMC Bioinformatics, 10:310, Open access, or pxt/my.publ.html One-shot Learning of Poisson Distributions p.27/27

ISSN Article

ISSN Article Entropy 23, 5, 22-22; doi:.339/e5422 OPEN ACCESS entropy ISSN 99-43 www.mdpi.com/journal/entropy Article Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data How

More information

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016 Decision-making, inference, and learning theory ECE 830 & CS 761, Spring 2016 1 / 22 What do we have here? Given measurements or observations of some physical process, we ask the simple question what do

More information

EXAMPLE: INFERENCE FOR POISSON SAMPLING

EXAMPLE: INFERENCE FOR POISSON SAMPLING EXAMPLE: INFERENCE FOR POISSON SAMPLING ERIK QUAEGHEBEUR AND GERT DE COOMAN. REPRESENTING THE DATA Observed numbers of emails arriving in Gert s mailbox between 9am and am on ten consecutive Mondays: 2

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

Chapter 7 Comparison of two independent samples

Chapter 7 Comparison of two independent samples Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n( ,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Single gene analysis of differential expression. Giorgio Valentini

Single gene analysis of differential expression. Giorgio Valentini Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples

More information

Intelligent Data Analysis Lecture Notes on Document Mining

Intelligent Data Analysis Lecture Notes on Document Mining Intelligent Data Analysis Lecture Notes on Document Mining Peter Tiňo Representing Textual Documents as Vectors Our next topic will take us to seemingly very different data spaces - those of textual documents.

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Linear Algebra and Probability

Linear Algebra and Probability Linear Algebra and Probability for Computer Science Applications Ernest Davis CRC Press Taylor!* Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor Sc Francis Croup, an informa

More information

Design of microarray experiments

Design of microarray experiments Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc. Chapter 24 Comparing Means Copyright 2010 Pearson Education, Inc. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side. For example:

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Chapter 3: Parametric families of univariate distributions CHAPTER 3: PARAMETRIC

More information

Supplementary Material for:

Supplementary Material for: Supplementary Material for: Correction to Kyllingsbæk, Markussen, and Bundesen (2012) Unfortunately, the computational shortcut used in Kyllingsbæk et al. (2012) (henceforth; the article) to fit the Poisson

More information

Variational Inference and Learning. Sargur N. Srihari

Variational Inference and Learning. Sargur N. Srihari Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES XX IMEKO World Congress Metrology for Green Growth September 9 14, 212, Busan, Republic of Korea MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES A B Forbes National Physical Laboratory, Teddington,

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

2. The Binomial Distribution

2. The Binomial Distribution 1 of 11 7/16/2009 6:39 AM Virtual Laboratories > 11. Bernoulli Trials > 1 2 3 4 5 6 2. The Binomial Distribution Basic Theory Suppose that our random experiment is to perform a sequence of Bernoulli trials

More information

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments The hypothesis testing framework The two-sample t-test Checking assumptions, validity Comparing more that

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble, COMP6053 lecture: Sampling and the central limit theorem Jason Noble, jn2@ecs.soton.ac.uk Populations: long-run distributions Two kinds of distributions: populations and samples. A population is the set

More information

Biology as Information Dynamics

Biology as Information Dynamics Biology as Information Dynamics John Baez Biological Complexity: Can It Be Quantified? Beyond Center February 2, 2017 IT S ALL RELATIVE EVEN INFORMATION! When you learn something, how much information

More information

Dispersion modeling for RNAseq differential analysis

Dispersion modeling for RNAseq differential analysis Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July

More information

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 Data Analysis: The mean egg masses (g) of the two different types of eggs may be exactly the same, in which case you may be tempted to accept

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Probability Distribution for a normal random variable x:

Probability Distribution for a normal random variable x: Chapter5 Continuous Random Variables 5.3 The Normal Distribution Probability Distribution for a normal random variable x: 1. It is and about its mean µ. 2. (the that x falls in the interval a < x < b is

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

A Modification of Linfoot s Informational Correlation Coefficient

A Modification of Linfoot s Informational Correlation Coefficient Austrian Journal of Statistics April 07, Volume 46, 99 05. AJS http://www.ajs.or.at/ doi:0.773/ajs.v46i3-4.675 A Modification of Linfoot s Informational Correlation Coefficient Georgy Shevlyakov Peter

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Conditional distributions (discrete case)

Conditional distributions (discrete case) Conditional distributions (discrete case) The basic idea behind conditional distributions is simple: Suppose (XY) is a jointly-distributed random vector with a discrete joint distribution. Then we can

More information

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements. Statistics notes Introductory comments These notes provide a summary or cheat sheet covering some basic statistical recipes and methods. These will be discussed in more detail in the lectures! What is

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University

More information

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Taposh Banerjee University of Texas at San Antonio Joint work with Gene Whipps (US Army Research Laboratory) Prudhvi

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

Probability and Statistics

Probability and Statistics CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER

More information

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

28 Bayesian Mixture Models for Gene Expression and Protein Profiles

28 Bayesian Mixture Models for Gene Expression and Protein Profiles 28 Bayesian Mixture Models for Gene Expression and Protein Profiles Michele Guindani, Kim-Anh Do, Peter Müller and Jeff Morris M.D. Anderson Cancer Center 1 Introduction We review the use of semi-parametric

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Confidence Intervals for Population Mean

Confidence Intervals for Population Mean Confidence Intervals for Population Mean Reading: Sections 7.1, 7.2, 7.3 Learning Objectives: Students should be able to: Understand the meaning and purpose of confidence intervals Calculate a confidence

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Process Mining in Non-Stationary Environments

Process Mining in Non-Stationary Environments and Machine Learning. Bruges Belgium), 25-27 April 2012, i6doc.com publ., ISBN 978-2-87419-049-0. Process Mining in Non-Stationary Environments Phil Weber, Peter Tiňo and Behzad Bordbar School of Computer

More information