Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes
|
|
- Barrie Berry
- 5 years ago
- Views:
Transcription
1 Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Christopher Holmes (joint work with Chris Yau) Department of Statistics, & Wellcome Trust Centre for Human Genetics, University of Oxford, & MRC Harwell Bordeaux, October 2011
2 Overview Motivating Problem: Copy-number-variation in the human and cancer genomes Classification models for sequences: state-space hidden Markov models how to make predictions? optimal predictions under computational constraints Application: Conclusions Copy-Number-Aberrations in cancer genomes
3 Copy-Number-Variation CNVs are quite common in normal germline (sperm / egg) genomes, and very common in cancers where they are known as CNAs Around 7% to 10% of the human genome is under copy number variation Somatic copy number variation induced during mitosis (cell devision) termed, Copy Number Aberrations (CNA) are key drivers of cancer
4 Typical CNV data file Probe based technologies allow one to (noisily) measure the amount of genomic content at pre-defined positions across the genome We are interested in statistical approaches to identify these regions of CNVs across 100,000s of loci
5 Classification of linear sequence This can be thought of as a generic task in classification on ordered data That is, suppose you have general predictors ordered in some sequence {x (1), x (2),..., x (T ) }, say by time, genomic location, a covariate, etc, and you wish to build a classifier for each observation, S i {1,..., M} classes P r(s i = j x (i) ) But where there is persistence (dependence) with index in the classifier, P r(s i = j, S i 1 = k x (i), x (i 1) ) P r(s i = j x (i) )P r(s i 1 = k x (i 1) )
6 Classification of linear sequence Note, this is not a conventional classification task as we don t have access to labelled training data, {S i, x (i) } T i=1 It s not a conventional clustering task as the number, M, and labels of classes are well defined { deletion, normal, duplication, etc } Part classification (the goal) and part clustering and we know that the class labels persists on the index, P r(s i = j S i 1 = j) > P r(s i = j)
7 Classification Two approaches are typically adopted in the literature Partition or change-point models which divide the region up into K (unknown) contiguous blocks and then post-classify (assign) each region to a class f(x) = K f(x j θ j ) j=1 Hidden state-space models that jointly find the regions AND classify each region directly according to the (hidden) class state, S j f(x) = K f(x j θ Sj ) j=1
8 State-space model To capture dependence in copy-number state across loci we can use a Hidden Markov Model (HMM) with hidden state index S i {1,..., M} f(x S) = T f(x i θ Si ) i=1 P r(s i+1 = j S i = k) = (Π) jk where (Π) is the M M transition matrix, θ Sj are the parameters in the likelihood associated with state S j So the transition matrix (Π) allows us to capture (extensive) prior beliefs regards the length scales of CNV events Note, (Π) jj governs the expected holding time (span) in state S i = j Change-points in classification occur when S i S i 1
9 Inference For the purposes of this talk, we assume that you have adopted your favourite Bayesian model, P r(x, θ, S), and computational algorithm to obtain an approximation to the marginal posterior density P r(s x) = P r(s, θ x)dθ The question is: Now what are we going to do with it? Note, P r(s x) is typically of dimension > 30 20,000 states
10 Predictions Most scientists wish to explore predictions under the model to highlight probable CNVs, and maybe rank them or look for association with disease traits But how to choose a prediction? How can we grade the quality (or utility) of a classification prediction on a sequence?
11 The nature of errors In conventional classification models, or polychotmous regression, a good way to assess the performance is to count errors via some statistic, ɛ = T l(ŝi, S i ) where Ŝi is the prediction and S i is the true state nature obtains i=1 But when classifying on a sequence of states this is less straightforward For illustration, consider the following binary classification where there is a notion of a null state, S i = 0 and an alternative S i = 1
12 Nature of errors Both (a) and (b); and (c) and (d), have the same number of pointwise errors
13 Judging predictions It s clear that judging the performance of the model is not as simple as that for regular classification models Moreover, it is clear that the performance is explicitly linked to what you re going to do with the prediction (action-orientated) So two problems exist (a) We need a way to judge performance, we need a loss-function (b) Given a loss-function we need a way to find Ŝ P r(s x), the state sequence (prediction(s)) of minimum expected loss, wrt P r(s x) and in the region of 30 20,000 state sequences for the problems we shall consider
14 Standard summaries for Hidden Markov models There are two (ubiquitous) approaches to reporting states from HMMs 1. The most probable or MAP sequence, using Viterbi algorithm, Ŝ = arg max P r(s = Z x) Z 2. The sequence of states that maximize the posterior marginals, using forward-backward algorithm, Ŝ = {Ŝ1,..., ŜT } Ŝ i = arg max Z i P r(s i = Z i x) P r(s i x) = S i P r(s x) where S i denotes all states other than the ith Under both schemes, Ŝ, can be found in order O(T M 2 ) computation. But do these relate to sensible choices?
15 Analysis on data
16 Standard summaries for Hidden Markov models 1. The most probable or MAP sequence looks good but it s inflexible If I wish to consider more calls, or less calls, I have no handle on this. 2. For the sequence of states that maximize the posterior marginals you do have a simple handle on making calls, { 0, if P r(s i = 1 x) < α Ŝ i = 1, otherwise where α controls the FDR, false-positives / false-negatives However, does this give sensible predictions as you vary α?
17 Analysis on data
18 Analysis on data
19 Analysis on data
20 Analysis on data
21 Analysis on data
22 Analysis on data
23 Calling using posterior marginals So, on the positive side for the posterior marginals we can find Ŝ exactly under P r(s x), using forward-backward algorithm, and the predictions are flexible to some FDR parameter monotone in α On the negative side the predictions don t look so good as we re really after the CNV events rather than point predictions To see why this is so we can take a closer look at the implicit cost-functions that the MAP, via Viterbi, and the posterior marginals, via forward-backward, are solving
24 Standard summaries for Hidden Markov models 1. The most probable or MAP sequence, Ŝ = arg max Z P r(s = Z x), relates to a 0 1 loss over the whole sequence l(z, S) = { 0, Z = S, 1, Z S. where Z is the prediction and nature obtains S 2. The sequence of states that maximize the marginals, Ŝ i = arg max Zi P r(s i = Z i x), relates to a pointwise loss invariant to permutations l(z, S) = l(z i, S i ) i { 0, Z i = S i, l(z i, S i ) = 1, Z i S i
25 These loss functions not really suited to the way people wish to look at CNV predictions However, to a large extent, the MAP and set of marginals seems to be the only methods considered in around 30 years of HMMs
26 Loss Functions under computational constraints One can ponder on a proper loss function appropriate for CNV classification l(z, S)...but then... Ŝ arg min E P r(s x)[l(z, S)]. Z is not computable (for the ones I could think of)
27 Optimal predictions under computational constraints Solution: think of a loss function which you can compute Ŝ with and which solves an approximating optimal prediction problem That is, solve a different but related decision problem precisely
28 Markov Loss Functions We have developed the use of k th order Markov loss functions specifically for calling CNVs l(z, S) = T k i=1 l(z i,..., Z i+k, S i,..., S i+k ) and in particular we shall consider the simplest case of first-order loss, T 1 l(z, S) = l(z i, Z i+1, S i, S i+1 ) i=1 Assuming the loss is homogeneous (does not alter over the sequence) this leads to a M k+1 M k+1 loss matrix with M 2k+2 entries which, for M = 2, k = 1, we can write as follows,
29 Table: Loss matrix l(z i, Z i+1, S i, S i+1) for M = 2 binary state transitions. Call (Z i, Z i+1 ) Nature (S i, S i+1 ) (0, 0) (0, 1) (1, 0) (1, 1) (0, 0) FNH FN (0, 1) FPT XX (1, 0) XX FNT (1, 1) FP FPH = Perfect = GoodCorrection FNH = FalseNegativeHold FNT = FalseNegativeTransition FPH = FalsePositiveHold FPT = FalsePositiveTransition FN = FalseNegative FP = FalsePositive XX = Oh no
30 Adapting your loss So by adjusting {,,FNH, FPH, FNT, FPT, XX, FN, FP} we can adapt calls to suit the application Note, if {, } = 0 {FNH,..., FP} = γ then we get back to the (zero th order) marginal loss So we can see that the use of posterior marginal calls is equivalent to statements of symmetry on the types of losses incurred
31 Calculating Ŝ, the Sequence of Minimum Loss The expected first-order loss is given by, E P r(s x) [l(z, S)] = { T 1 } l(z i, Z i+1, S i, S i+1 ) P r(s x) S i=1 where, by exchanging the order of summation, { } T 1 E P r(s x) [l(z, S)] = l(z i, Z i+1, S i, S i+1 )P r(s x), = i=1 T 1 i=1 S {S i,s i+1} l(z i, Z i+1, S i, S i+1 )P r(s i, S i+1 x).
32 Dynamic Programming The expected loss is additive, so Ŝ can be found exactly using dynamic programming recursions (similar to Viterbi algorithm): For first-order loss the computational cost is O(M 4 T ), where M is the number of hidden states and T is the length of the sequence more generally O(M 2k T ) for a kth order Markov loss function Note, the approach is applicable to any discrete state sequence model P r(s x), e.g. change-point models
33 Example For illustration we consider the previous data example with, {, } = 0 {FN, FNH, FPH} = 1 {FP, FNT, FPT} = γ XX = 500 With γ > 1 which penalises FalsePositives and also jumps (calls) when there should not have been ones That is, once you made a prediction of an event, S i = 1, you are committed to it
34 Analysis on data
35 Analysis on data
36 Analysis on data
37 Analysis on data
38 Analysis on data
39 Analysis on data
40 Analysis on data And barring a XX, this case is identical to the marginal loss
41 Behaviour So by altering the penalties on certain error types we can align the predictions to the context of the analysis It s important to note that this loss specification is separated from the actual modelling task: 1. Build the best possible model of nature π(x) = P r(x, θ)dθ 2. Define a loss specific to the task to hand and compute Ŝ These two tasks being seperate
42 Motivating Ongoing Studies The methodology we present has largely been shaped by two major on-going collaborations: Clinical Diagnostics for Leukemia. Anna Schuh, Jenny Taylor, Longitudinal analysis (diagnosis/relapse) of Chronic Lymphoblastoid Leukemia (CLL) blood samples. 400 samples 1.2m ukp Health Innovation Challenge Fund award (NHS-Wellcome Trust) Ludwig Colon Cancer Initiative. Oliver Seiber (Melbourne LCI) Genome-wide -omics study of 1,000+ paired normal-tumour colon cancer samples.
43 Copy-Number-Aberrations (CNAs) in Cancer It is well established that CNAs are key drivers of oncogenesis through deletion of tumour suppression genes duplication of oncogenes loss of heterozygosity (LOH) However, real data is much more complicated than previous illustrative examples non-standard noise distributions (NP or SP likelihoods) tumour heterogeneity (mixture models) stromal contamination (de-convolution) longitudinal design, population samples
44 Copy-Number-Variation One major challenge in the analysis of CNAs is that the sample contains mixtures of population sub-types
45 Tumor Heterogeneity So the sample represents a population of cells from multiple subtypes
46 SNP genotyping arrays Simply measuring total DNA content, such as in array-chg, will not help as neither copy neutral LOH or heterogeneity levels are identifiable from the data Ideally you would like to do single cell sequencing, but we re still a few years off this However technologies originally designed to genotype single-neucleotide-polymorphisms (SNPs) turn out to be useful SNP: a single base in the genome where two (or more) of {A, G, C, T } are found in the population
47 Genotyping So the genotypes form three clusters in the probe-output space But what happens to the signal when you have a CNV at the SNP?
48 CNV
49 Polar Coordinates It s much simpler to work in polar coordinates So at SNPs the data looks like, And for CNAs...
50 CNAs in theory it should look like this but in practice look like this...
51 Generalized Genotyping: actual (messy) data Noise in the real data means we need to see the signal span multiple adjacent loci in order to be confident of a call
52 Genome-wide data: colon cancer SNP data is composed of two types of signal: Log R Ratio (r) whose magnitude is related to the total number of alleles (r 0 corresponds to copy number 2) B Allele Frequency (b) measures the relative numbers of the B alleles in the genotype at each location, e.g. a genotype AB has b = 1/2.
53 Tumor Heterogeneity: spike-in experiment Columns show data under 0%, 21% and 50% contamination respectively. Top row shows log-r and second row the corresponding B-allele freq Note, the SNP array measures the aggregated contribution of all cell populations in the sample
54 Tumor Heterogeneity: deletion-normal
55 The Model We can model this process using a mixture of hidden Markov models So we now have two (hidden) states, a normal state {AA, AB, BB} S (N) i And a tumour aberration state S (T ) i {A, B, AAA, AAB,...} And a mixing proportion, w i (0, 1) which can vary across the genome depending on the % of tumour cells in the sample that have an aberration genotype
56 The model is written as, P r(s (T ) i P r(x i ) = w i f(x i S (N) i ) + (1 w i )f(x i S (T ) i ) P r(s (N) i ) = Dir(α 1, α 2 ) S (N) i, S (T ) i 1 ) = v j Dir[α j (S (N) i, S (T ) i 1 )] j w i Be(a, b) where x i = {log R, B.freq} Note, the allowable states of S (T ) i genotype are restricted by S (N) i the normal E.g. if the normal is homoygous, S (N) i = {AA}, then the tumour copy-number-genotype must be in S (T ) i {A, AAA, AAAA,..., }
57 Likelihood We need a likelihood (sampling distribution) for f(x i S i )
58 Likelihood Normal densities are non-robust and lead to appalling predictions We have investigated mixture models for the within state sampling distributions NAR 2007; Genome Biol. 2010; JRSSB Semi-parametric mixture of Student f(x i S i ) = K v j St ν (mu jsi, V Si ) j=1 v j = Dir(ɛ, ɛ,...) with ɛ set small to encourage sparsity 2. Non-parametric mixtures using Dirichlet Process (MDP) f(x i S i ) = N(x i P )dp (θ) P D(α, P 0 ) which is formally an infinite mixture model with Dirichlet Process with concentration parameter α and baseline measure P 0.
59 The MDP model requires careful computational strategies to be applicable to genomic scale data (JRSSB, 2011). When dealing with 100s of samples, we tend to adopt the Student mixture and make use of the EM algorithm from dispersed starting points Having learnt a model we then wish to explore predictions under P r(s (T ) x (N), x (B) ) P r(s (T ) x (N), x (B) ) = P r(s (T ), S (N), θ x (N), x (B) )dθ S (N) where θ contains the mixing weights, transition parameters of the HMM and parameters in the likelihood S (T ) has around 21 states, hence we use a sparse loss matrix l(z i, Z i+1, S i, S i+1 ) that penalises calls from the normal state and false transitions
60 Colon cancer: Ch11
61 Colon cancer: Ch12
62 Colon cancer: Ch7
63 Conclusions Subjective Bayesian statistics provides a prescriptive formal framework on how to think about data analysis Build your best model of nature don t worry (too much) about how the model is to be used Define your loss and suggest actions accordingly Often â, the best action can not be computed with certainty Solution: solve a different problem precisely Optimal predictions under computational constraints These approaches provide state-of-the-art predictions for structural variation in cancer genomics
64 Thank you relevant papers: Identification of novel CNVs CNV discovery NAR, 2007; JRSSB, 2011; Gen. Biol., 2010 Classification (calling) of CNVs within multiple samples at known loci CNV classification Gen. Epi., 2011 Association between CNVs and traits of interest, response variables, such as disease status CNV association Nature, 2010; Leukemia, 2011 (under revision)
65 Extensions: Sequential (longitudinal) model Problem Motivated by CLL study. Multiple samples from the same patient at different times. - Normal, Diagnosis, Treatment, Relapse Track changes in copy number/loh profile over time. Borrow information from across samples to improve calling. Model data at each locus {x (N) i, x (D) i, x states {S (N) i, S (T ) i } (T r) i tumour state composition % s {w (N) i, x (R) i } Interest is identifying regions whereby w (R) i w = 0, w (D) i, w (T r) i (T r) i, w (R) i } is large and for trends
66 Sequential model The duplication at diagnosis is not detected using single-sample analysis.
67 Sequential model
68 Population model Problem Multiple tumour samples from different patients. Cluster tumours into distinct groups based on copy number/loh profile. Borrow structural information across samples to improve inference. Model Maintain a pool of tumour states {S (T ) 1,..., S (T ) K } Defining a nonparametric prior on these states so that K is adaptive Allow each tumour to sample from this population pool with some probability Interest is in the pool of tumour states and how they associate with clinical traits
69 Population model SNP (Copy number) Tumor Type K x x x x x SNP Individual Tumor Type Tumor AB AB ABB AAB AAA 1 1 Normal AB AB AB AB AA % Normal Tumor A A AB AB AA 2 2 Normal A A AB AB AA % Normal Tumor AA AB ABB ABB BB 3 1 Normal AA AB AB AB BB % Normal Tumor B B ABB ABB BB 4 3 Normal AB AB AB AB BB % Normal Tumor A A AAB ABB AAA 5 3 Normal AB AA AB AB AA % Normal where K is unknown, K {1, 2,...}
70 Population model: colon cancer, 500 samples, chrm 8 And we can investigate association between the clusters and clinical traits
Association studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationRobust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationMulti-state Models: An Overview
Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationEquivalence of random-effects and conditional likelihoods for matched case-control studies
Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLinear models for the joint analysis of multiple. array-cgh profiles
Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationSeq2Seq Losses (CTC)
Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationMachine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017
1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationExpected complete data log-likelihood and EM
Expected complete data log-likelihood and EM In our EM algorithm, the expected complete data log-likelihood Q is a function of a set of model parameters τ, ie M Qτ = log fb m, r m, g m z m, l m, τ p mz
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationIntroduction to Algorithms
Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationFinal Examination CS540-2: Introduction to Artificial Intelligence
Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationAdaptive testing of conditional association through Bayesian recursive mixture modeling
Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationCNV Methods File format v2.0 Software v2.0.0 September, 2011
File format v2.0 Software v2.0.0 September, 2011 Copyright 2011 Complete Genomics Incorporated. All rights reserved. cpal and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries.
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationChapter 2. Data Analysis
Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMultiplex network inference
(using hidden Markov models) University of Cambridge Bioinformatics Group Meeting 11 February 2016 Words of warning Disclaimer These slides have been produced by combining & translating two of my previous
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More information