Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes

Size: px
Start display at page:

Download "Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes"

Transcription

1 Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Christopher Holmes (joint work with Chris Yau) Department of Statistics, & Wellcome Trust Centre for Human Genetics, University of Oxford, & MRC Harwell Bordeaux, October 2011

2 Overview Motivating Problem: Copy-number-variation in the human and cancer genomes Classification models for sequences: state-space hidden Markov models how to make predictions? optimal predictions under computational constraints Application: Conclusions Copy-Number-Aberrations in cancer genomes

3 Copy-Number-Variation CNVs are quite common in normal germline (sperm / egg) genomes, and very common in cancers where they are known as CNAs Around 7% to 10% of the human genome is under copy number variation Somatic copy number variation induced during mitosis (cell devision) termed, Copy Number Aberrations (CNA) are key drivers of cancer

4 Typical CNV data file Probe based technologies allow one to (noisily) measure the amount of genomic content at pre-defined positions across the genome We are interested in statistical approaches to identify these regions of CNVs across 100,000s of loci

5 Classification of linear sequence This can be thought of as a generic task in classification on ordered data That is, suppose you have general predictors ordered in some sequence {x (1), x (2),..., x (T ) }, say by time, genomic location, a covariate, etc, and you wish to build a classifier for each observation, S i {1,..., M} classes P r(s i = j x (i) ) But where there is persistence (dependence) with index in the classifier, P r(s i = j, S i 1 = k x (i), x (i 1) ) P r(s i = j x (i) )P r(s i 1 = k x (i 1) )

6 Classification of linear sequence Note, this is not a conventional classification task as we don t have access to labelled training data, {S i, x (i) } T i=1 It s not a conventional clustering task as the number, M, and labels of classes are well defined { deletion, normal, duplication, etc } Part classification (the goal) and part clustering and we know that the class labels persists on the index, P r(s i = j S i 1 = j) > P r(s i = j)

7 Classification Two approaches are typically adopted in the literature Partition or change-point models which divide the region up into K (unknown) contiguous blocks and then post-classify (assign) each region to a class f(x) = K f(x j θ j ) j=1 Hidden state-space models that jointly find the regions AND classify each region directly according to the (hidden) class state, S j f(x) = K f(x j θ Sj ) j=1

8 State-space model To capture dependence in copy-number state across loci we can use a Hidden Markov Model (HMM) with hidden state index S i {1,..., M} f(x S) = T f(x i θ Si ) i=1 P r(s i+1 = j S i = k) = (Π) jk where (Π) is the M M transition matrix, θ Sj are the parameters in the likelihood associated with state S j So the transition matrix (Π) allows us to capture (extensive) prior beliefs regards the length scales of CNV events Note, (Π) jj governs the expected holding time (span) in state S i = j Change-points in classification occur when S i S i 1

9 Inference For the purposes of this talk, we assume that you have adopted your favourite Bayesian model, P r(x, θ, S), and computational algorithm to obtain an approximation to the marginal posterior density P r(s x) = P r(s, θ x)dθ The question is: Now what are we going to do with it? Note, P r(s x) is typically of dimension > 30 20,000 states

10 Predictions Most scientists wish to explore predictions under the model to highlight probable CNVs, and maybe rank them or look for association with disease traits But how to choose a prediction? How can we grade the quality (or utility) of a classification prediction on a sequence?

11 The nature of errors In conventional classification models, or polychotmous regression, a good way to assess the performance is to count errors via some statistic, ɛ = T l(ŝi, S i ) where Ŝi is the prediction and S i is the true state nature obtains i=1 But when classifying on a sequence of states this is less straightforward For illustration, consider the following binary classification where there is a notion of a null state, S i = 0 and an alternative S i = 1

12 Nature of errors Both (a) and (b); and (c) and (d), have the same number of pointwise errors

13 Judging predictions It s clear that judging the performance of the model is not as simple as that for regular classification models Moreover, it is clear that the performance is explicitly linked to what you re going to do with the prediction (action-orientated) So two problems exist (a) We need a way to judge performance, we need a loss-function (b) Given a loss-function we need a way to find Ŝ P r(s x), the state sequence (prediction(s)) of minimum expected loss, wrt P r(s x) and in the region of 30 20,000 state sequences for the problems we shall consider

14 Standard summaries for Hidden Markov models There are two (ubiquitous) approaches to reporting states from HMMs 1. The most probable or MAP sequence, using Viterbi algorithm, Ŝ = arg max P r(s = Z x) Z 2. The sequence of states that maximize the posterior marginals, using forward-backward algorithm, Ŝ = {Ŝ1,..., ŜT } Ŝ i = arg max Z i P r(s i = Z i x) P r(s i x) = S i P r(s x) where S i denotes all states other than the ith Under both schemes, Ŝ, can be found in order O(T M 2 ) computation. But do these relate to sensible choices?

15 Analysis on data

16 Standard summaries for Hidden Markov models 1. The most probable or MAP sequence looks good but it s inflexible If I wish to consider more calls, or less calls, I have no handle on this. 2. For the sequence of states that maximize the posterior marginals you do have a simple handle on making calls, { 0, if P r(s i = 1 x) < α Ŝ i = 1, otherwise where α controls the FDR, false-positives / false-negatives However, does this give sensible predictions as you vary α?

17 Analysis on data

18 Analysis on data

19 Analysis on data

20 Analysis on data

21 Analysis on data

22 Analysis on data

23 Calling using posterior marginals So, on the positive side for the posterior marginals we can find Ŝ exactly under P r(s x), using forward-backward algorithm, and the predictions are flexible to some FDR parameter monotone in α On the negative side the predictions don t look so good as we re really after the CNV events rather than point predictions To see why this is so we can take a closer look at the implicit cost-functions that the MAP, via Viterbi, and the posterior marginals, via forward-backward, are solving

24 Standard summaries for Hidden Markov models 1. The most probable or MAP sequence, Ŝ = arg max Z P r(s = Z x), relates to a 0 1 loss over the whole sequence l(z, S) = { 0, Z = S, 1, Z S. where Z is the prediction and nature obtains S 2. The sequence of states that maximize the marginals, Ŝ i = arg max Zi P r(s i = Z i x), relates to a pointwise loss invariant to permutations l(z, S) = l(z i, S i ) i { 0, Z i = S i, l(z i, S i ) = 1, Z i S i

25 These loss functions not really suited to the way people wish to look at CNV predictions However, to a large extent, the MAP and set of marginals seems to be the only methods considered in around 30 years of HMMs

26 Loss Functions under computational constraints One can ponder on a proper loss function appropriate for CNV classification l(z, S)...but then... Ŝ arg min E P r(s x)[l(z, S)]. Z is not computable (for the ones I could think of)

27 Optimal predictions under computational constraints Solution: think of a loss function which you can compute Ŝ with and which solves an approximating optimal prediction problem That is, solve a different but related decision problem precisely

28 Markov Loss Functions We have developed the use of k th order Markov loss functions specifically for calling CNVs l(z, S) = T k i=1 l(z i,..., Z i+k, S i,..., S i+k ) and in particular we shall consider the simplest case of first-order loss, T 1 l(z, S) = l(z i, Z i+1, S i, S i+1 ) i=1 Assuming the loss is homogeneous (does not alter over the sequence) this leads to a M k+1 M k+1 loss matrix with M 2k+2 entries which, for M = 2, k = 1, we can write as follows,

29 Table: Loss matrix l(z i, Z i+1, S i, S i+1) for M = 2 binary state transitions. Call (Z i, Z i+1 ) Nature (S i, S i+1 ) (0, 0) (0, 1) (1, 0) (1, 1) (0, 0) FNH FN (0, 1) FPT XX (1, 0) XX FNT (1, 1) FP FPH = Perfect = GoodCorrection FNH = FalseNegativeHold FNT = FalseNegativeTransition FPH = FalsePositiveHold FPT = FalsePositiveTransition FN = FalseNegative FP = FalsePositive XX = Oh no

30 Adapting your loss So by adjusting {,,FNH, FPH, FNT, FPT, XX, FN, FP} we can adapt calls to suit the application Note, if {, } = 0 {FNH,..., FP} = γ then we get back to the (zero th order) marginal loss So we can see that the use of posterior marginal calls is equivalent to statements of symmetry on the types of losses incurred

31 Calculating Ŝ, the Sequence of Minimum Loss The expected first-order loss is given by, E P r(s x) [l(z, S)] = { T 1 } l(z i, Z i+1, S i, S i+1 ) P r(s x) S i=1 where, by exchanging the order of summation, { } T 1 E P r(s x) [l(z, S)] = l(z i, Z i+1, S i, S i+1 )P r(s x), = i=1 T 1 i=1 S {S i,s i+1} l(z i, Z i+1, S i, S i+1 )P r(s i, S i+1 x).

32 Dynamic Programming The expected loss is additive, so Ŝ can be found exactly using dynamic programming recursions (similar to Viterbi algorithm): For first-order loss the computational cost is O(M 4 T ), where M is the number of hidden states and T is the length of the sequence more generally O(M 2k T ) for a kth order Markov loss function Note, the approach is applicable to any discrete state sequence model P r(s x), e.g. change-point models

33 Example For illustration we consider the previous data example with, {, } = 0 {FN, FNH, FPH} = 1 {FP, FNT, FPT} = γ XX = 500 With γ > 1 which penalises FalsePositives and also jumps (calls) when there should not have been ones That is, once you made a prediction of an event, S i = 1, you are committed to it

34 Analysis on data

35 Analysis on data

36 Analysis on data

37 Analysis on data

38 Analysis on data

39 Analysis on data

40 Analysis on data And barring a XX, this case is identical to the marginal loss

41 Behaviour So by altering the penalties on certain error types we can align the predictions to the context of the analysis It s important to note that this loss specification is separated from the actual modelling task: 1. Build the best possible model of nature π(x) = P r(x, θ)dθ 2. Define a loss specific to the task to hand and compute Ŝ These two tasks being seperate

42 Motivating Ongoing Studies The methodology we present has largely been shaped by two major on-going collaborations: Clinical Diagnostics for Leukemia. Anna Schuh, Jenny Taylor, Longitudinal analysis (diagnosis/relapse) of Chronic Lymphoblastoid Leukemia (CLL) blood samples. 400 samples 1.2m ukp Health Innovation Challenge Fund award (NHS-Wellcome Trust) Ludwig Colon Cancer Initiative. Oliver Seiber (Melbourne LCI) Genome-wide -omics study of 1,000+ paired normal-tumour colon cancer samples.

43 Copy-Number-Aberrations (CNAs) in Cancer It is well established that CNAs are key drivers of oncogenesis through deletion of tumour suppression genes duplication of oncogenes loss of heterozygosity (LOH) However, real data is much more complicated than previous illustrative examples non-standard noise distributions (NP or SP likelihoods) tumour heterogeneity (mixture models) stromal contamination (de-convolution) longitudinal design, population samples

44 Copy-Number-Variation One major challenge in the analysis of CNAs is that the sample contains mixtures of population sub-types

45 Tumor Heterogeneity So the sample represents a population of cells from multiple subtypes

46 SNP genotyping arrays Simply measuring total DNA content, such as in array-chg, will not help as neither copy neutral LOH or heterogeneity levels are identifiable from the data Ideally you would like to do single cell sequencing, but we re still a few years off this However technologies originally designed to genotype single-neucleotide-polymorphisms (SNPs) turn out to be useful SNP: a single base in the genome where two (or more) of {A, G, C, T } are found in the population

47 Genotyping So the genotypes form three clusters in the probe-output space But what happens to the signal when you have a CNV at the SNP?

48 CNV

49 Polar Coordinates It s much simpler to work in polar coordinates So at SNPs the data looks like, And for CNAs...

50 CNAs in theory it should look like this but in practice look like this...

51 Generalized Genotyping: actual (messy) data Noise in the real data means we need to see the signal span multiple adjacent loci in order to be confident of a call

52 Genome-wide data: colon cancer SNP data is composed of two types of signal: Log R Ratio (r) whose magnitude is related to the total number of alleles (r 0 corresponds to copy number 2) B Allele Frequency (b) measures the relative numbers of the B alleles in the genotype at each location, e.g. a genotype AB has b = 1/2.

53 Tumor Heterogeneity: spike-in experiment Columns show data under 0%, 21% and 50% contamination respectively. Top row shows log-r and second row the corresponding B-allele freq Note, the SNP array measures the aggregated contribution of all cell populations in the sample

54 Tumor Heterogeneity: deletion-normal

55 The Model We can model this process using a mixture of hidden Markov models So we now have two (hidden) states, a normal state {AA, AB, BB} S (N) i And a tumour aberration state S (T ) i {A, B, AAA, AAB,...} And a mixing proportion, w i (0, 1) which can vary across the genome depending on the % of tumour cells in the sample that have an aberration genotype

56 The model is written as, P r(s (T ) i P r(x i ) = w i f(x i S (N) i ) + (1 w i )f(x i S (T ) i ) P r(s (N) i ) = Dir(α 1, α 2 ) S (N) i, S (T ) i 1 ) = v j Dir[α j (S (N) i, S (T ) i 1 )] j w i Be(a, b) where x i = {log R, B.freq} Note, the allowable states of S (T ) i genotype are restricted by S (N) i the normal E.g. if the normal is homoygous, S (N) i = {AA}, then the tumour copy-number-genotype must be in S (T ) i {A, AAA, AAAA,..., }

57 Likelihood We need a likelihood (sampling distribution) for f(x i S i )

58 Likelihood Normal densities are non-robust and lead to appalling predictions We have investigated mixture models for the within state sampling distributions NAR 2007; Genome Biol. 2010; JRSSB Semi-parametric mixture of Student f(x i S i ) = K v j St ν (mu jsi, V Si ) j=1 v j = Dir(ɛ, ɛ,...) with ɛ set small to encourage sparsity 2. Non-parametric mixtures using Dirichlet Process (MDP) f(x i S i ) = N(x i P )dp (θ) P D(α, P 0 ) which is formally an infinite mixture model with Dirichlet Process with concentration parameter α and baseline measure P 0.

59 The MDP model requires careful computational strategies to be applicable to genomic scale data (JRSSB, 2011). When dealing with 100s of samples, we tend to adopt the Student mixture and make use of the EM algorithm from dispersed starting points Having learnt a model we then wish to explore predictions under P r(s (T ) x (N), x (B) ) P r(s (T ) x (N), x (B) ) = P r(s (T ), S (N), θ x (N), x (B) )dθ S (N) where θ contains the mixing weights, transition parameters of the HMM and parameters in the likelihood S (T ) has around 21 states, hence we use a sparse loss matrix l(z i, Z i+1, S i, S i+1 ) that penalises calls from the normal state and false transitions

60 Colon cancer: Ch11

61 Colon cancer: Ch12

62 Colon cancer: Ch7

63 Conclusions Subjective Bayesian statistics provides a prescriptive formal framework on how to think about data analysis Build your best model of nature don t worry (too much) about how the model is to be used Define your loss and suggest actions accordingly Often â, the best action can not be computed with certainty Solution: solve a different problem precisely Optimal predictions under computational constraints These approaches provide state-of-the-art predictions for structural variation in cancer genomics

64 Thank you relevant papers: Identification of novel CNVs CNV discovery NAR, 2007; JRSSB, 2011; Gen. Biol., 2010 Classification (calling) of CNVs within multiple samples at known loci CNV classification Gen. Epi., 2011 Association between CNVs and traits of interest, response variables, such as disease status CNV association Nature, 2010; Leukemia, 2011 (under revision)

65 Extensions: Sequential (longitudinal) model Problem Motivated by CLL study. Multiple samples from the same patient at different times. - Normal, Diagnosis, Treatment, Relapse Track changes in copy number/loh profile over time. Borrow information from across samples to improve calling. Model data at each locus {x (N) i, x (D) i, x states {S (N) i, S (T ) i } (T r) i tumour state composition % s {w (N) i, x (R) i } Interest is identifying regions whereby w (R) i w = 0, w (D) i, w (T r) i (T r) i, w (R) i } is large and for trends

66 Sequential model The duplication at diagnosis is not detected using single-sample analysis.

67 Sequential model

68 Population model Problem Multiple tumour samples from different patients. Cluster tumours into distinct groups based on copy number/loh profile. Borrow structural information across samples to improve inference. Model Maintain a pool of tumour states {S (T ) 1,..., S (T ) K } Defining a nonparametric prior on these states so that K is adaptive Allow each tumour to sample from this population pool with some probability Interest is in the pool of tumour states and how they associate with clinical traits

69 Population model SNP (Copy number) Tumor Type K x x x x x SNP Individual Tumor Type Tumor AB AB ABB AAB AAA 1 1 Normal AB AB AB AB AA % Normal Tumor A A AB AB AA 2 2 Normal A A AB AB AA % Normal Tumor AA AB ABB ABB BB 3 1 Normal AA AB AB AB BB % Normal Tumor B B ABB ABB BB 4 3 Normal AB AB AB AB BB % Normal Tumor A A AAB ABB AAA 5 3 Normal AB AA AB AB AA % Normal where K is unknown, K {1, 2,...}

70 Population model: colon cancer, 500 samples, chrm 8 And we can investigate association between the clusters and clinical traits

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Expected complete data log-likelihood and EM

Expected complete data log-likelihood and EM Expected complete data log-likelihood and EM In our EM algorithm, the expected complete data log-likelihood Q is a function of a set of model parameters τ, ie M Qτ = log fb m, r m, g m z m, l m, τ p mz

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Adaptive testing of conditional association through Bayesian recursive mixture modeling

Adaptive testing of conditional association through Bayesian recursive mixture modeling Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

CNV Methods File format v2.0 Software v2.0.0 September, 2011

CNV Methods File format v2.0 Software v2.0.0 September, 2011 File format v2.0 Software v2.0.0 September, 2011 Copyright 2011 Complete Genomics Incorporated. All rights reserved. cpal and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries.

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Multiplex network inference

Multiplex network inference (using hidden Markov models) University of Cambridge Bioinformatics Group Meeting 11 February 2016 Words of warning Disclaimer These slides have been produced by combining & translating two of my previous

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information