From Arabidopsis roots to bilinear equations

Size: px
Start display at page:

Download "From Arabidopsis roots to bilinear equations"

Transcription

1 From Arabidopsis roots to bilinear equations Dustin Cartwright 1 October 22, joint with Philip Benfey, Siobhan Brady, David Orlando (Duke University) and Bernd Sturmfels (UC Berkeley), research supported by the DARPA project Fundamental Laws of Biology

2 Arabidopsis root

3 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes.

4 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab: I Chemically, using 18 markers (colors in diagram A)

5 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab: I Chemically, using 18 markers (colors in diagram A) I Physically, using 13 longitudinal sections (red lines in diagram B)

6 Measurement along two axes Markers measure variation among cell types.

7 Measurement along two axes Markers measure variation among cell types. Longitudinal sections measure variation along developmental stage.

8 Measurement along two axes Markers measure variation among cell types. Longitudinal sections measure variation along developmental stage. Naïve approach would use variation among each set of experiments as proxies for variation along each of the two axes.

9 Problem with naïve approach Correspondence between markers and cell types is imperfect.

10 Problem with naïve approach Correspondence between markers and cell types is imperfect. For example, the sample labelled APL consists of mixture of two cell types: cell type section phloem phloem companion cells columella 0 0.

11 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells.

12 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells. In sections 6-12, there are no lateral root cap cells.

13 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells. In sections 6-12, there are no lateral root cap cells. Conclusion: Need to analyze each transcript across all 31 (= ) experiments to model the expression pattern in the whole root.

14 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level.

15 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters.

16 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters. The coefficients of these functionals can be determined from: Numbers of cells present in each section Marker selection patterns

17 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters. The coefficients of these functionals can be determined from: Numbers of cells present in each section Marker selection patterns Under-constrained system: 31 (= ) functionals and 129 clusters.

18 Assumption Since the system is under constrained, we make the following assumption.

19 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type.

20 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type. More precisely, the expression level of cluster in section i and type j is x i y j for some vectors x and y.

21 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type. More precisely, the expression level of cluster in section i and type j is x i y j for some vectors x and y. Example If the expression level is either 0 or 1 (off or on), then our assumption says that it is 1 for the combination of some subset of the sections and some subset of the cell types.

22 Non-negative bilinear equations A (1),..., A (k) o 1,..., o k n m non-negative matrices (cell mixture) non-negative scalars (expression levels) Solve (approximately) f 1 (x, y) := x t A (1) y = o 1. f k (x, y) := x t A (k) y = o k x x n = 1

23 Non-negative bilinear equations A (1),..., A (k) o 1,..., o k n m non-negative matrices (cell mixture) non-negative scalars (expression levels) Solve (approximately) f 1 (x, y) := x t A (1) y = o 1. f k (x, y) := x t A (k) y = o k x x n = 1 for x and y non-negative vectors of dimensions n 1 and m 1 respectively.

24 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y)

25 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y) coming from the following process: 1. Pick a pair of integers from {1,..., n} {1,..., m} with (i, j) having probability proportional to ( ) l A(l) ij x i y j

26 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y) coming from the following process: 1. Pick a pair of integers from {1,..., n} {1,..., m} with (i, j) having probability proportional to ( ) l A(l) ij x i y j 2. Output an integer from {1,..., k}. Conditional on having picked i and j in the previous step, the probability of outputing l is: ( A (l) ) ij / l A(l) ij

27 Maximum Likelihood Estimation Rescaling both sides of our system of equations: f l (x, y) l f = o l (x, y) l l o l for l = 1,..., k

28 Maximum Likelihood Estimation Rescaling both sides of our system of equations: f l (x, y) l f = o l (x, y) l l o l for l = 1,..., k Finding an approximate solution to these equations is known as Maximum Likelihood Estimation.

29 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log f l (x) l

30 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log z l + f l (x, y) f l (x) l We generalize divergence to any pair of non-negative vectors.

31 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log z l + f l (x, y) f l (x) l We generalize divergence to any pair of non-negative vectors. By approximate solution to a system, we will mean the a solution which minimizes the Kullback-Leibler divergence.

32 Expectation Maximization Want to solve: i,j A (l) ij x i y j = o l for l = 1,..., k (1)

33 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1)

34 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij

35 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij Find approximate solution to system: ( ) A (l) ij x i y j e ijl =: e ij l l

36 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij Find approximate solution to system: ( ) A (l) ij x i y j e ijl =: e ij l l Repeat until convergence

37 Likelihood maximization for monomial models where A ij = l A(l) ij. g : R n R m R nm (x i ), (y j ) A ij x i y j

38 Likelihood maximization for monomial models g : R n R m R nm (x i ), (y j ) A ij x i y j where A ij = l A(l) ij. Moment map (taking row sums and column sums): µ: R nm R n R m ( ( ) b ij b ij ), b ij j i

39 Likelihood maximization for monomial models g : R n R m R nm (x i ), (y j ) A ij x i y j where A ij = l A(l) ij. Moment map (taking row sums and column sums): µ: R nm R n R m ( ( ) b ij b ij ), b ij Theorem Kullback-Leibler divergence D(z g(x, y)) is minimized over all x and y when µ(z) equals µ(g(x, y)). j i

40 Inverting the moment map: Iterative Proportional Fitting b R nm g(x, y) µ µ(g(x, y)) µ(b) R n R m

41 Inverting the moment map: Iterative Proportional Fitting Adjust x i : x i x i Adjust ỹ i : ỹ j ỹ j j b ij j a ij x i ỹ j i b ij i a ij x i ỹ j Iterate until convergence b g(x, y) µ µ(g(x, y)) µ(b) R nm R n R m

42 Validation: Preliminary results On the left is a visual representation of the reconstructed expression levels. On the right, the expression levels for the same transcript are visualized using GFP.

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

What is Singular Learning Theory?

What is Singular Learning Theory? What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Fall Inverse of a matrix. Institute: UC San Diego. Authors: Alexander Knop

Fall Inverse of a matrix. Institute: UC San Diego. Authors: Alexander Knop Fall 2017 Inverse of a matrix Authors: Alexander Knop Institute: UC San Diego Row-Column Rule If the product AB is defined, then the entry in row i and column j of AB is the sum of the products of corresponding

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CLASSICAL FORMS OF LINEAR PROGRAMS, CONVERSION TECHNIQUES, AND SOME NOTATION

CLASSICAL FORMS OF LINEAR PROGRAMS, CONVERSION TECHNIQUES, AND SOME NOTATION (Revised) October 12, 2004 CLASSICAL FORMS OF LINEAR PROGRAMS, CONVERSION TECHNIQUES, AND SOME NOTATION Linear programming is the minimization (maximization) of a linear objective, say c1x 1 + c2x 2 +

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Commuting birth-and-death processes

Commuting birth-and-death processes Commuting birth-and-death processes Caroline Uhler Department of Statistics UC Berkeley (joint work with Steven N. Evans and Bernd Sturmfels) MSRI Workshop on Algebraic Statistics December 18, 2008 Birth-and-death

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Matrices: 2.1 Operations with Matrices

Matrices: 2.1 Operations with Matrices Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets

4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets 4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets William H. Press The University of Texas at Austin Lecture 6 IMPRS Summer School 2009, Prof. William H. Press 1 Mixture

More information

Matrices and Vectors

Matrices and Vectors Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Info-Clustering: An Information-Theoretic Paradigm for Data Clustering. Information Theory Guest Lecture Fall 2018

Info-Clustering: An Information-Theoretic Paradigm for Data Clustering. Information Theory Guest Lecture Fall 2018 Info-Clustering: An Information-Theoretic Paradigm for Data Clustering Information Theory Guest Lecture Fall 208 Problem statement Input: Independently and identically drawn samples of the jointly distributed

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

(II.B) Basis and dimension

(II.B) Basis and dimension (II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

Structural Grobner Basis. Bernd Sturmfels and Markus Wiegelmann TR May Department of Mathematics, UC Berkeley.

Structural Grobner Basis. Bernd Sturmfels and Markus Wiegelmann TR May Department of Mathematics, UC Berkeley. I 1947 Center St. Suite 600 Berkeley, California 94704-1198 (510) 643-9153 FAX (510) 643-7684 INTERNATIONAL COMPUTER SCIENCE INSTITUTE Structural Grobner Basis Detection Bernd Sturmfels and Markus Wiegelmann

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

MODEL ANSWERS TO HWK #3

MODEL ANSWERS TO HWK #3 MODEL ANSWERS TO HWK #3 1. Suppose that the point p = [v] and that the plane H corresponds to W V. Then a line l containing p, contained in H is spanned by the vector v and a vector w W, so that as a point

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Chapter 1 Vector Spaces

Chapter 1 Vector Spaces Chapter 1 Vector Spaces Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 110 Linear Algebra Vector Spaces Definition A vector space V over a field

More information

Doug Cochran. 3 October 2011

Doug Cochran. 3 October 2011 ) ) School Electrical, Computer, & Energy Arizona State University (Joint work with Stephen Howard and Bill Moran) 3 October 2011 Tenets ) The purpose sensor networks is to sense; i.e., to enable detection,

More information

Solutions. Part I Logistic regression backpropagation with a single training example

Solutions. Part I Logistic regression backpropagation with a single training example Solutions Part I Logistic regression backpropagation with a single training example In this part, you are using the Stochastic Gradient Optimizer to train your Logistic Regression. Consequently, the gradients

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Fall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017

Fall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017 Fall 2017 Qualifier Exam: OPTIMIZATION September 18, 2017 GENERAL INSTRUCTIONS: 1 Answer each question in a separate book 2 Indicate on the cover of each book the area of the exam, your code number, and

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix.

MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix. MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix. Basis Definition. Let V be a vector space. A linearly independent spanning set for V is called a basis.

More information

The Algebraic Degree of Semidefinite Programming

The Algebraic Degree of Semidefinite Programming The Algebraic Degree ofsemidefinite Programming p. The Algebraic Degree of Semidefinite Programming BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY joint work with and Jiawang Nie (IMA Minneapolis)

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Math 180B Problem Set 3

Math 180B Problem Set 3 Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }

More information

Chapter-2 Relations and Functions. Miscellaneous

Chapter-2 Relations and Functions. Miscellaneous 1 Chapter-2 Relations and Functions Miscellaneous Question 1: The relation f is defined by The relation g is defined by Show that f is a function and g is not a function. The relation f is defined as It

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg)

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg) Dimension Reduc-on PCA, SVD, MDS, and clustering Example: height of iden-cal twins Twin (inches away from avg) 0 5 0 5 0 5 0 5 0 Twin (inches away from avg) Expression between two ethnic groups Frequency

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

MODEL ANSWERS TO THE FIRST QUIZ. 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function

MODEL ANSWERS TO THE FIRST QUIZ. 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function MODEL ANSWERS TO THE FIRST QUIZ 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function A: I J F, where I is the set of integers between 1 and m and J is

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

CSL361 Problem set 4: Basic linear algebra

CSL361 Problem set 4: Basic linear algebra CSL361 Problem set 4: Basic linear algebra February 21, 2017 [Note:] If the numerical matrix computations turn out to be tedious, you may use the function rref in Matlab. 1 Row-reduced echelon matrices

More information

2018 Fall 2210Q Section 013 Midterm Exam II Solution

2018 Fall 2210Q Section 013 Midterm Exam II Solution 08 Fall 0Q Section 0 Midterm Exam II Solution True or False questions points 0 0 points) ) Let A be an n n matrix. If the equation Ax b has at least one solution for each b R n, then the solution is unique

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Modeling of Growing Networks with Directional Attachment and Communities

Modeling of Growing Networks with Directional Attachment and Communities Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan

More information

Kevin James. MTHSC 3110 Section 2.1 Matrix Operations

Kevin James. MTHSC 3110 Section 2.1 Matrix Operations MTHSC 3110 Section 2.1 Matrix Operations Notation Let A be an m n matrix, that is, m rows and n columns. We ll refer to the entries of A by their row and column indices. The entry in the i th row and j

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

MAC Learning Objectives. Learning Objectives (Cont.) Module 10 System of Equations and Inequalities II

MAC Learning Objectives. Learning Objectives (Cont.) Module 10 System of Equations and Inequalities II MAC 1140 Module 10 System of Equations and Inequalities II Learning Objectives Upon completing this module, you should be able to 1. represent systems of linear equations with matrices. 2. transform a

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

1 What is a hidden Markov model?

1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

Matrix Arithmetic. j=1

Matrix Arithmetic. j=1 An m n matrix is an array A = Matrix Arithmetic a 11 a 12 a 1n a 21 a 22 a 2n a m1 a m2 a mn of real numbers a ij An m n matrix has m rows and n columns a ij is the entry in the i-th row and j-th column

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

Computational Complexity of Tensor Problems

Computational Complexity of Tensor Problems Computational Complexity of Tensor Problems Chris Hillar Redwood Center for Theoretical Neuroscience (UC Berkeley) Joint work with Lek-Heng Lim (with L-H Lim) Most tensor problems are NP-hard, Journal

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Asymptotic Approximation of Marginal Likelihood Integrals

Asymptotic Approximation of Marginal Likelihood Integrals Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

ECON 5111 Mathematical Economics

ECON 5111 Mathematical Economics Test 1 October 1, 2010 1. Construct a truth table for the following statement: [p (p q)] q. 2. A prime number is a natural number that is divisible by 1 and itself only. Let P be the set of all prime numbers

More information