From Arabidopsis roots to bilinear equations
|
|
- Ashlee Osborne
- 5 years ago
- Views:
Transcription
1 From Arabidopsis roots to bilinear equations Dustin Cartwright 1 October 22, joint with Philip Benfey, Siobhan Brady, David Orlando (Duke University) and Bernd Sturmfels (UC Berkeley), research supported by the DARPA project Fundamental Laws of Biology
2 Arabidopsis root
3 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes.
4 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab: I Chemically, using 18 markers (colors in diagram A)
5 Arabidopsis root Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab: I Chemically, using 18 markers (colors in diagram A) I Physically, using 13 longitudinal sections (red lines in diagram B)
6 Measurement along two axes Markers measure variation among cell types.
7 Measurement along two axes Markers measure variation among cell types. Longitudinal sections measure variation along developmental stage.
8 Measurement along two axes Markers measure variation among cell types. Longitudinal sections measure variation along developmental stage. Naïve approach would use variation among each set of experiments as proxies for variation along each of the two axes.
9 Problem with naïve approach Correspondence between markers and cell types is imperfect.
10 Problem with naïve approach Correspondence between markers and cell types is imperfect. For example, the sample labelled APL consists of mixture of two cell types: cell type section phloem phloem companion cells columella 0 0.
11 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells.
12 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells. In sections 6-12, there are no lateral root cap cells.
13 Problem with naïve approach Similarly, the longitudinal sections do not have the same mixture of cells. For example: In each of sections 1-5, 30-50% of the cells are lateral root cap cells. In sections 6-12, there are no lateral root cap cells. Conclusion: Need to analyze each transcript across all 31 (= ) experiments to model the expression pattern in the whole root.
14 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level.
15 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters.
16 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters. The coefficients of these functionals can be determined from: Numbers of cells present in each section Marker selection patterns
17 Model A cluster consists of cells of the same type in the same section. Each cluster has an expression level. For each marker and each longitudinal section, we have a measurement functional, a linear combination of the expression levels in different clusters. The coefficients of these functionals can be determined from: Numbers of cells present in each section Marker selection patterns Under-constrained system: 31 (= ) functionals and 129 clusters.
18 Assumption Since the system is under constrained, we make the following assumption.
19 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type.
20 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type. More precisely, the expression level of cluster in section i and type j is x i y j for some vectors x and y.
21 Assumption Since the system is under constrained, we make the following assumption. The dependence on the expression level on the section is independent of the dependence on the cell type. More precisely, the expression level of cluster in section i and type j is x i y j for some vectors x and y. Example If the expression level is either 0 or 1 (off or on), then our assumption says that it is 1 for the combination of some subset of the sections and some subset of the cell types.
22 Non-negative bilinear equations A (1),..., A (k) o 1,..., o k n m non-negative matrices (cell mixture) non-negative scalars (expression levels) Solve (approximately) f 1 (x, y) := x t A (1) y = o 1. f k (x, y) := x t A (k) y = o k x x n = 1
23 Non-negative bilinear equations A (1),..., A (k) o 1,..., o k n m non-negative matrices (cell mixture) non-negative scalars (expression levels) Solve (approximately) f 1 (x, y) := x t A (1) y = o 1. f k (x, y) := x t A (k) y = o k x x n = 1 for x and y non-negative vectors of dimensions n 1 and m 1 respectively.
24 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y)
25 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y) coming from the following process: 1. Pick a pair of integers from {1,..., n} {1,..., m} with (i, j) having probability proportional to ( ) l A(l) ij x i y j
26 Probabilistic interpretation f l (x, y) := i,j A (l) ij x i y j for l = 1,..., k Up to scaling, this vector has the form of the family of probability distributions (depending on vectors x and y) coming from the following process: 1. Pick a pair of integers from {1,..., n} {1,..., m} with (i, j) having probability proportional to ( ) l A(l) ij x i y j 2. Output an integer from {1,..., k}. Conditional on having picked i and j in the previous step, the probability of outputing l is: ( A (l) ) ij / l A(l) ij
27 Maximum Likelihood Estimation Rescaling both sides of our system of equations: f l (x, y) l f = o l (x, y) l l o l for l = 1,..., k
28 Maximum Likelihood Estimation Rescaling both sides of our system of equations: f l (x, y) l f = o l (x, y) l l o l for l = 1,..., k Finding an approximate solution to these equations is known as Maximum Likelihood Estimation.
29 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log f l (x) l
30 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log z l + f l (x, y) f l (x) l We generalize divergence to any pair of non-negative vectors.
31 Kullback-Leibler divergence Kullback-Leibler divergence gives a way of comparing two probability distributions: D(z f (x, y)) := ( ) zl z l log z l + f l (x, y) f l (x) l We generalize divergence to any pair of non-negative vectors. By approximate solution to a system, we will mean the a solution which minimizes the Kullback-Leibler divergence.
32 Expectation Maximization Want to solve: i,j A (l) ij x i y j = o l for l = 1,..., k (1)
33 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1)
34 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij
35 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij Find approximate solution to system: ( ) A (l) ij x i y j e ijl =: e ij l l
36 Expectation Maximization Want to solve: i,j Start with guesses x, ỹ A (l) ij x i y j = o l for l = 1,..., k (1) Estimate contribution of (i, j) term of left side of equation 1 needed to obtain equality: x i ỹ j i j A(l) i j x o l =: e ijl i ỹ j A (l) ij Find approximate solution to system: ( ) A (l) ij x i y j e ijl =: e ij l l Repeat until convergence
37 Likelihood maximization for monomial models where A ij = l A(l) ij. g : R n R m R nm (x i ), (y j ) A ij x i y j
38 Likelihood maximization for monomial models g : R n R m R nm (x i ), (y j ) A ij x i y j where A ij = l A(l) ij. Moment map (taking row sums and column sums): µ: R nm R n R m ( ( ) b ij b ij ), b ij j i
39 Likelihood maximization for monomial models g : R n R m R nm (x i ), (y j ) A ij x i y j where A ij = l A(l) ij. Moment map (taking row sums and column sums): µ: R nm R n R m ( ( ) b ij b ij ), b ij Theorem Kullback-Leibler divergence D(z g(x, y)) is minimized over all x and y when µ(z) equals µ(g(x, y)). j i
40 Inverting the moment map: Iterative Proportional Fitting b R nm g(x, y) µ µ(g(x, y)) µ(b) R n R m
41 Inverting the moment map: Iterative Proportional Fitting Adjust x i : x i x i Adjust ỹ i : ỹ j ỹ j j b ij j a ij x i ỹ j i b ij i a ij x i ỹ j Iterate until convergence b g(x, y) µ µ(g(x, y)) µ(b) R nm R n R m
42 Validation: Preliminary results On the left is a visual representation of the reconstructed expression levels. On the right, the expression levels for the same transcript are visualized using GFP.
Mathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationWhat is Singular Learning Theory?
What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More informationEstimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS
MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationFall Inverse of a matrix. Institute: UC San Diego. Authors: Alexander Knop
Fall 2017 Inverse of a matrix Authors: Alexander Knop Institute: UC San Diego Row-Column Rule If the product AB is defined, then the entry in row i and column j of AB is the sum of the products of corresponding
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationCLASSICAL FORMS OF LINEAR PROGRAMS, CONVERSION TECHNIQUES, AND SOME NOTATION
(Revised) October 12, 2004 CLASSICAL FORMS OF LINEAR PROGRAMS, CONVERSION TECHNIQUES, AND SOME NOTATION Linear programming is the minimization (maximization) of a linear objective, say c1x 1 + c2x 2 +
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationNeural networks: Unsupervised learning
Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give
More informationCommuting birth-and-death processes
Commuting birth-and-death processes Caroline Uhler Department of Statistics UC Berkeley (joint work with Steven N. Evans and Bernd Sturmfels) MSRI Workshop on Algebraic Statistics December 18, 2008 Birth-and-death
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationMatrices: 2.1 Operations with Matrices
Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More information4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets
4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets William H. Press The University of Texas at Austin Lecture 6 IMPRS Summer School 2009, Prof. William H. Press 1 Mixture
More informationMatrices and Vectors
Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationInfo-Clustering: An Information-Theoretic Paradigm for Data Clustering. Information Theory Guest Lecture Fall 2018
Info-Clustering: An Information-Theoretic Paradigm for Data Clustering Information Theory Guest Lecture Fall 208 Problem statement Input: Independently and identically drawn samples of the jointly distributed
More informationAuxiliary-Function Methods in Optimization
Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More information(II.B) Basis and dimension
(II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.
More informationExpectation-Maximization
Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationStructural Grobner Basis. Bernd Sturmfels and Markus Wiegelmann TR May Department of Mathematics, UC Berkeley.
I 1947 Center St. Suite 600 Berkeley, California 94704-1198 (510) 643-9153 FAX (510) 643-7684 INTERNATIONAL COMPUTER SCIENCE INSTITUTE Structural Grobner Basis Detection Bernd Sturmfels and Markus Wiegelmann
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationMODEL ANSWERS TO HWK #3
MODEL ANSWERS TO HWK #3 1. Suppose that the point p = [v] and that the plane H corresponds to W V. Then a line l containing p, contained in H is spanned by the vector v and a vector w W, so that as a point
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationChapter 1 Vector Spaces
Chapter 1 Vector Spaces Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 110 Linear Algebra Vector Spaces Definition A vector space V over a field
More informationDoug Cochran. 3 October 2011
) ) School Electrical, Computer, & Energy Arizona State University (Joint work with Stephen Howard and Bill Moran) 3 October 2011 Tenets ) The purpose sensor networks is to sense; i.e., to enable detection,
More informationSolutions. Part I Logistic regression backpropagation with a single training example
Solutions Part I Logistic regression backpropagation with a single training example In this part, you are using the Stochastic Gradient Optimizer to train your Logistic Regression. Consequently, the gradients
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationFall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017
Fall 2017 Qualifier Exam: OPTIMIZATION September 18, 2017 GENERAL INSTRUCTIONS: 1 Answer each question in a separate book 2 Indicate on the cover of each book the area of the exam, your code number, and
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationMATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix.
MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix. Basis Definition. Let V be a vector space. A linearly independent spanning set for V is called a basis.
More informationThe Algebraic Degree of Semidefinite Programming
The Algebraic Degree ofsemidefinite Programming p. The Algebraic Degree of Semidefinite Programming BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY joint work with and Jiawang Nie (IMA Minneapolis)
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationSingle-channel source separation using non-negative matrix factorization
Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationAccelerating the EM Algorithm for Mixture Density Estimation
Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic
More informationMath 180B Problem Set 3
Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationChapter-2 Relations and Functions. Miscellaneous
1 Chapter-2 Relations and Functions Miscellaneous Question 1: The relation f is defined by The relation g is defined by Show that f is a function and g is not a function. The relation f is defined as It
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationDimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg)
Dimension Reduc-on PCA, SVD, MDS, and clustering Example: height of iden-cal twins Twin (inches away from avg) 0 5 0 5 0 5 0 5 0 Twin (inches away from avg) Expression between two ethnic groups Frequency
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationMODEL ANSWERS TO THE FIRST QUIZ. 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function
MODEL ANSWERS TO THE FIRST QUIZ 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function A: I J F, where I is the set of integers between 1 and m and J is
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationCSL361 Problem set 4: Basic linear algebra
CSL361 Problem set 4: Basic linear algebra February 21, 2017 [Note:] If the numerical matrix computations turn out to be tedious, you may use the function rref in Matlab. 1 Row-reduced echelon matrices
More information2018 Fall 2210Q Section 013 Midterm Exam II Solution
08 Fall 0Q Section 0 Midterm Exam II Solution True or False questions points 0 0 points) ) Let A be an n n matrix. If the equation Ax b has at least one solution for each b R n, then the solution is unique
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationModeling of Growing Networks with Directional Attachment and Communities
Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan
More informationKevin James. MTHSC 3110 Section 2.1 Matrix Operations
MTHSC 3110 Section 2.1 Matrix Operations Notation Let A be an m n matrix, that is, m rows and n columns. We ll refer to the entries of A by their row and column indices. The entry in the i th row and j
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More information1 The linear algebra of linear programs (March 15 and 22, 2015)
1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More information2D Image Processing (Extended) Kalman and particle filter
2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
More informationMAC Learning Objectives. Learning Objectives (Cont.) Module 10 System of Equations and Inequalities II
MAC 1140 Module 10 System of Equations and Inequalities II Learning Objectives Upon completing this module, you should be able to 1. represent systems of linear equations with matrices. 2. transform a
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)
More informationDe novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes
De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More information1 What is a hidden Markov model?
1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and
More informationMatrix Arithmetic. j=1
An m n matrix is an array A = Matrix Arithmetic a 11 a 12 a 1n a 21 a 22 a 2n a m1 a m2 a mn of real numbers a ij An m n matrix has m rows and n columns a ij is the entry in the i-th row and j-th column
More informationLecture 23 Branch-and-Bound Algorithm. November 3, 2009
Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal
More informationComputational Complexity of Tensor Problems
Computational Complexity of Tensor Problems Chris Hillar Redwood Center for Theoretical Neuroscience (UC Berkeley) Joint work with Lek-Heng Lim (with L-H Lim) Most tensor problems are NP-hard, Journal
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationAsymptotic Approximation of Marginal Likelihood Integrals
Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationECON 5111 Mathematical Economics
Test 1 October 1, 2010 1. Construct a truth table for the following statement: [p (p q)] q. 2. A prime number is a natural number that is divisible by 1 and itself only. Let P be the set of all prime numbers
More information