Nonparametric Bayes tensor factorizations for big data

Size: px
Start display at page:

Download "Nonparametric Bayes tensor factorizations for big data"

Transcription

1 Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES & DARPA N C-2082

2 Motivation Conditional tensor factorizations Some properties - heuristic & otherwise Computation & applications Generalizations

3 Motivating setting - high dimensional predictors Routine to encounter massive-dimensional prediction & variable selection problems We have y Y & x = (x 1,..., x p ) X Unreasonable to assume linearity or additivity in motivating applications - e.g., epidemiology, genomics, neurosciences Goal: nonparametric approaches that accommodate large p, small n, allow interactions, scale computationally to big p

4 Gaussian processes with variable selection For Y = R & X R p, then one approach lets y i = µ(x i ) + ɛ i, ɛ i N(0, σ 2 ), where µ : X R is an unknown regression function Following Zou et al. (2010) & others, { µ GP(m, c), c(x, x ) = φ exp with mixture priors placed on α j s p α j (x j x j ) }, 2 j=1 Zou et al. (2010) show good empirical results Bhattacharya, Pati & Dunson (2011) - minimax adaptive rates

5 Issues & alternatives Mean regression & computation challenging Difficult computationally beyond conditionally Gaussian homoscedastic case Density regression interesting as variance & shape of response distribution often changes with x Initial focus: classification from many categorical predictors Approach generalizes directly to arbitrary Y and X.

6 Classification & conditional probability tensors Suppose Y {1,..., d 0 } & X j {1,..., d j },j=1,...,p The classification function or conditional probability is Pr(Y = y X 1 = x 1,..., X p = x p ) = P(y x 1,..., x p ). This classification function can be structured as a d 0 d 1 d p tensor Let P d1,...,d p (d 0 ) denote to set of all possible conditional probability tensors P P d1,...,d p (d 0 ) implies P(y x 1,..., x p ) 0 y, x 1,..., x p & d0 y=1 P(y x 1,..., x p ) = 1

7 Tensor factorizations P = big tensor & data will be very sparse If P was a matrix, we may think of SVD We can instead consider a tensor factorization Common approach is PARAFAC - sum of rank one tensors Tucker factorizations express d 1 d p tensor A = {a c1 c p } as a c1 c p = d 1 h 1 =1 d j p g h1 h p h p=1 j=1 where G = {g h1 h p } is a core tensor, u (j) h j c j,

8 Our factorization (with Yun Yang) Our proposed nonparametric model for the conditional probability: P(y x 1,..., x p ) = k 1 h 1 =1 k p h p=1 λ h1 h 2...h p (y) p j=1 Tucker factorization of the conditional probability P π (j) h j (x j ) (1) To be valid conditional probability, parameters subject to d 0 c=1 k j h=1 λ h1 h 2...h p (c) = 1, for any (h 1, h 2,..., h p ), π (j) h (x j) = 1, for any possible pair of (j, x j ). (2)

9 Comments on proposed factorization k j = 1 corresponds to exclusion of the jth feature By placing prior on k j, can induce variable selection & learning of dimension of factorization Representation is many-to-one and the parameters in the factorization cannot be uniquely identified. Does not present a barrier to Bayesian inference - we don t care about the parameters in factorization We want to do variable selection, prediction & inferences on predictor effects

10 Theoretical support The following Theorem formalizes the flexibility: Theorem Every d 0 d 1 d 2 d p conditional probability tensor P P d1,...,d p (d 0 ) can be decomposed as (1), with 1 k j d j for j = 1,..., p. Furthermore, λ h1 h 2...h p (y) and π (j) h j (x j ) can be chosen to be nonnegative and satisfy the constraints (2).

11 Latent variable representation Simplify representation through introducing p latent class indicators z 1,..., z p for X 1,..., X p Conditional independence of Y and (X 1,..., X p ) given (z 1,..., z p ) The model can be written as Y i z i1,..., z ip Mult ( {1,..., d 0 }, λ zi1,...,z ip ), z ij X ij = x j Mult ( {1,..., k j }, π (j) 1 (x j),..., π (j) k j (x j ) ), Useful computationally & provides some insight into the model

12 Prior specification & hierarchical model Conditional likelihood of response is (Y i z i1,..., z ip, Λ) Multinomial ( {1,..., d 0 }, λ zi1,...,z ip ) Conditional likelihood of latent class variables is (z ij X ij = x j, π) Multinomial ( {1,..., k j }, π (j) 1 (x j),..., π (j) k j (x j ) ) Prior on core tensor λ h1,...,h p = ( λh1,...,h p (1),..., λ h1,...,h p (d 0 ) ) Diri(1/d 0,..., 1/d 0 ) Prior on independent rank one components, ( π (j) 1 (x j),..., π (j) k j (x j ) ) Diri(1/k j,..., 1/k j )

13 Prior on predictor inclusion/tensor rank For the jth dimension, we choose the simple prior P(k j = 1) = 1 r p, P(k j = k) = d j =# levels of covariate X j. r (d j 1)p, k = 2,..., d j, r =expected # important features, r =specified maximum number of features Effective prior on k j s is P(k 1 = l 1,..., k p = l p ) = P(k 1 = l 1 ) P(k p = l p )I { {j:lj >1} r}(l 1,..., l p ), where I A ( ) is the indicator function for set A.

14 Properties - Bias-Variance Tradeoff Extreme data sparsity - vast majority of combinations of Y, X 1,..., X p not observed Critical to include sparsity assumptions - even if such assumptions do not hold, massively reduces the variance Discard predictors having small impact & parameters having small values Makes the problem tractable & may lead to good MSE

15 Illustrative example Binary Y & p binary covariates X j { 1, 1}, j = 1,..., p The true model can be expressed in the form [β (0, 1)] P(Y = 1 X 1 = x 1,..., X p = x p ) = β 2 2 x β 2 p+1 x p. Effect of X j decreases exponentially as j increases from 1 to p. Natural strategy: estimate P(Y = 1 X 1 = x 1,..., X p = x p ) by sample frequencies over 1st k covariates {i : y i = 1, x 1i = x 1,..., x ki = x k }, {i : x 1i = x 1,..., x ki = x k } & ignore the remaining p k covariates. Suppose we have n = 2 l (k l p) observations with one in each cell of combinations of X 1,..., X l.

16 MSE analysis Mean square error (MSE) can be expressed as MSE = h 1,...,h p E Bias 2 + Var. { P(Y = 1 X1 = h 1,..., X p = h p ) ˆP(Y = 1 X 1 = h 1,..., X k = h k ) } 2 The squared bias is Bias 2 = { P(Y = 1 X1 = h 1,..., X p = h p ) h 1,...,h p = β 2 2 E ˆP(Y = 1 X 1 = h 1,..., X k = h k ) } 2 ( ) 2i p+1 = β2 3 (2p 2k 2 2 p 2 ). k+1 2p k 1 i=1

17 MSE analysis (continued) Finally we obtain the variance as Var = VarˆP(Y = 1 X 1 = h 1,..., X k = h k ) h 1,...,h p 2k 1 ( p k = 2 2 l 2 + 2i 1 )( 1 2 k+1 β 2 2i 1 ) 2 k+1 β i=1 = 1 3{ (3 β 2 )2 p+k l 2 + β 2 2 p k l 2}. Since there are 2 p cells, the average MSE for each cell equals 1{ (3 β 2 )2 k l 2 + β 2 2 k l 2 + β 2 2 2k 2 β 2 2 2p 2}. 3

18 Implications of MSE analysis #predictors p has little impact on selection of k k l & so second term small comparing to 1st & 3rd terms Average MSE obtains its minimum at k l/3 = log 2 (n)/3 True model not sparse & all the predictors impact conditional probability But optimal # predictors only depends on the log sample size

19 Borrowing of information Critical feature of our model is borrowing across cells Letting w h1,...,h p (x 1,..., x p ) = j π(j) h j (x j ), our model is P(Y = y X 1 = x 1,..., X p = x p ) = h 1,...,h p w h1,...,h p (x 1,..., x p )λ h1...h p (y), with h 1,...,h p w h1,...,h p (x 1,..., x p ) = 1. View λ h1...h p (y) as frequency of Y = y in cell X 1 = h 1,..., X p = h p We have kernel estimate for borrowing information via weighted avg of cell freqs

20 Illustrative example One covariate X {1,..., m} with Y {0, 1} & P j = P(Y = 1 X = j) Naive estimate ˆP j = k j /n j = {i : y i = 1, x i = j}/ {i : x i = j} = sample freqs Alternatively, consider kernel estimate indexed by 0 c 1/(m 1) P j = {1 (m 1)c}ˆP j + c k j ˆP k, j = 1,..., m. Use squared error loss to compare these estimators

21 MSE for illustrative example E{L(ˆP, P)} = m j=1 E(ˆP j P j ) 2 = m P j (1 P j ) j=1 n j. E{L( P, P)} = m j=1 E( P j P j ) 2 is fn of c with min at c 0 = 1 E{L(ˆP, P)} m E{L(ˆP, P)} + 1 m 1 i<j (P i P j ) 2 ( 0, ) 1. m 1 When P j s are similar, estimate P can reduce risk up to only 1/m the risk of estimating ˆP separately. If P j s are not similar, P can still reduce the risk considerably when the cell counts {n j } are small.

22 Setting & assumptions Data y n & X n for n subjects with p n n (large p, small n) Assume d j = d for simplicity in exposition Putting true model P 0 in our tensor form, assume Assumption A. p n j=1 max x j d h j =2 π(j) h j (x j ) <. This is a near sparsity restriction on P 0. Additionally assume true conditional probabilities strictly greater than zero, Assumption B. P 0 (y x) ɛ 0 for any x, y for some ɛ 0 > 0.

23 Posterior convergence theorem x 1,..., x n independent from unknown G n on {1,..., d} pn Let ɛ n be a sequence with ɛ n 0, nɛ 2 n and n exp( nɛ2 n) <. Assume the following conditions hold: (i) r n log p n nɛ 2 n, (ii) r n d rn log( r n /ɛ n ) nɛ 2 n, (iii) r n /p n 0 as n, and (iv) there exists a sequence of models γ n with size r n such that j / γ n max xj d h j =2 π(j) h j (x j ) ɛ 2 n. Denote d(p, P 0 ) = d0 P(y x1,..., x p ) P 0 (y x 1,..., x p ) Gn (dx 1,..., dx p ), then y=1 Π n { P : d(p, P0 ) Mɛ n y n, X n} 0 a.s.p n 0.

24 Implications of theorem Posterior convergence rate can be very close to n 1/2 for appropriate hyperparameter choices. For any α (0, 1), ɛ n = n (1 α)/2 log n satisfies conditions rn r n log n (# important predictors scales w/ log n) pn exp(n α ) (# candidate predictors exponential in n) There exists a sequence of models γn with size r n such that max x j j / γ n d h j =2 π (j) h j (x j ) n α 1 log 2 n. Use r n = log d (n), r n = 2r n as default values for the prior in applications.

25 Posterior computation Conditionally on {k j } simple Gibbs sampler - Dirichlet & multinomial conditionals # components to update p j=1 k j - can blow up with p but all but small number of k j = 1 To update {k j } we can use RJMCMC - current vs doesn t scale very well computationally Two stage algorithm: (i) SSVS to estimate k j - acceptance probs use approximated conditional marginal likelihoods; (ii) conditionally on {ˆk j } run Gibbs Scales efficiently & excellent performance in cases we have considered

26 Simulation study N = 2, 000 instances, p = 600 covariates X j {1,..., 4} & binary response Y True model: 3 important predictors X 9, X 11 and X 13 Generate P(Y = 1 X 9 = x 9, X 11 = x 11, X 13 = x 13 ) independently for each combination of (x 9, x 11, x 13 ). To obtain optimal misclassification rate 15%, generated f (U) = U 2 /{U 2 + (1 U) 2 }, U Unif(0, 1). n training samples & N n testing Training - n {200, 400, 600, 800}, with 10 random training-test splits. Apply our approach to each split.

27 Simulation results - misclassification rate Table: Testing Results for Synthetic Data Example. RF: random forests; TF: Our tensor factorization model. training size amse of TF Misclassification Rate of TF Misclassification Rate of RF amse = 1 { P(Y = 1 x1 4 p,..., x p ) ˆP(Y = 1 x 1,..., x p ) } 2, x 1,...,x p

28 Simulation results - variable selection performance Table: Columns 2-4 = inclusion probs of 9th,11th,13th predictors. Col 5 = max inclusion prob across remaining predictors. Col 6 = average inclusion probability across the remaining predictors. Quantities are averages over 10 trials. training size Max Average

29 Application data sets 1. Promoter gene sequences: A, C, G, T nucleotides at p = 57 positions for N = 106 sequences & binary response (promoter or not). 5-fold CV - n = 85 training & N n = 21 test samples in each split. 2. Splice-junction gene sequences: A, C, G, T nucleotides at p = 60 positions for N = 3, 175 sequences. response classes: exon/intron boundary (EI), intron/exon boundary (IE) or neither (N). Test - n {200, 2540}. 3. Single Proton Emission Computed Tomography (SPECT): cardiac patients normal/abnormal. 267 SPECT images & 22 binary features. Previously divided - n = 80 & N n = 187.

30 Results Table: RF: random forests, NN: neural networks, SVM: support vector machine, BART: Bayesian additive regression trees, TF: Our tensor factorization model. Misclassification rates are displayed. Data CART RF NN LASSO SVM BART TF Promoter (n=85) Splice (n=200) Splice (n=2540) SPECT (n=80) At worst comparable classification performance with RF best of competitors Particularly good relative performance as n decreases & p increases

31 Variable selection - interpretability & parsimony Additional advantages in terms of variable selection In promoter data, selected nucleotides at 15th, 16th, 17th, and 39th positions In splice data, 28th, 29th, 30th, 31st, 32nd and 35th positions are selected. In SPECT data, 11st, 13rd and 16th predictors are selected. Each case obtained excellent classification performance based on a small subset of the predictors.

32 Generalization - conditional distribution modeling Generalization: conditional distribution estimation f (y x) = k k 1 h=1 h 1 =1 k p h p=1 π hh1 h p (x)k(y; θ hh1 h p ), {π hh1 h p (x)} = core probability tensor of predictor-dependent weights on a multiway array of kernels Motivated by above conditional tensor factorization for classification, let π hh1 h p = π h p j=1 π (j) h j (x j ).

33 Linear Tucker density regression Letting x X = [0, 1] p and ψ j [0, 1], choose π (j) 1 (x j) = 1 x j ψ j, π (j) 2 (x j) = x j ψ j, k j = 2, j = 1,..., p, Model linearly interpolates but otherwise is extremely flexible In simple case in which p = 1 & Gaussian kernel, we have f (y x) = k h=1 π h { (1 xψ)n(y; µh1, τ 1 h1 ) + xψn(y; µ h2, τ 1 h2 )}, Induces the linear mean regression model { k E(y x) = h=1 } { k π h µ h1 + h=1 } π h ψ(µ h2 µ h1 ) x = β 0 + β 1 x,

34 Linear Tucker density regression - comments Different quantiles of f (y x) change linearly with x but with slopes that vary If k = 1, and τ h1 = τ h2 = τ h, obtain simple normal linear regression Density changes linearly - f (y x = 0) = h π hn(y; µ h1, τ 1 to f (y x = 1) = h π hn(y; µ h2, τ 1 h2 ) as x increases As p increases, still interpolate linearly but accommodate interactions Posterior computation single stage Gibbs sampler (multinomial, Dirichlet, normal-gamma, Bernoulli, beta) h1 )

35 Joint tensor factorizations Focused in this talk on conditional Tucker factorizations Can also use probabilistic tensor factorizations for joint modeling Very useful for huge sparse contingency table analysis Same ideas provide type of multivariate generalization of current Bayes discrete mixtures Instead of a single cluster index, multiple dependent cluster indices underlying each type of data

36 References Banerjee, A., Murray, J. & Dunson, D.B. (2012). Nonparametric Bayes infinite tensor factorization priors. Bhattacharya, A. and Dunson, D.B. (2012). Simplex factor models for multivariate unordered categorical data. JASA, 107, Bhattacharya, A. and Dunson, D.B. (2012). Nonparametric Bayes testing of associations in high-dimensional categorical data. almost done! Dunson, D.B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. JASA, 104, Ghosh, A. and Dunson, D.B. (2012). Conditional distribution from high-dimensional interacting predictors. in progress. Yang, Y. and Dunson, D.B. (2012). Bayesian conditional tensor factorizations for high-dimensional classification. arxiv.

Bayesian Conditional Tensor Factorizations for High-Dimensional Classification

Bayesian Conditional Tensor Factorizations for High-Dimensional Classification Bayesian Conditional Tensor Factorizations for High-Dimensional Classification Yun Yang Department of Statistical Science, Duke University, Durham, USA. David B. Dunson Department of Statistical Science,

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Simplex factor models for multivariate unordered. categorical data

Simplex factor models for multivariate unordered. categorical data Simplex factor models for multivariate unordered categorical data Anirban Bhattacharya, David B. Dunson Department of Statistical Science, Duke University, NC 27708 email: ab179@stat.duke.edu, dunson@stat.duke.edu

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Will Penny. DCM short course, Paris 2012

Will Penny. DCM short course, Paris 2012 DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, 2014 1 / 25 Outline

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information