Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data
|
|
- Jody Cannon
- 6 years ago
- Views:
Transcription
1 Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department, Cylab Carnegie Mellon University May 20, 2008 (Joint Work with P. Hersh, A. Rinaldo, and Yi Zhou) 1 / 32
2 Outline Latent Class Models - existence and uniqueness of MLE - identifiability - model selection Algebraic Geometry - the geometric description of LC models Examples Swiss Francs problem - sparse 2 16 table from the National Long Term Care Survey 2 / 32
3 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. 3 / 32
4 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32
5 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32
6 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32
7 Categorical Data and Contingency Tables Consider k categorical variables, X 1,..., X k, where each X i takes value on the finite set [d i ] {1,..., d i }. The cross-classification of N i.i.d. realizations of (X 1,..., X k ) produces a random integer-valued vector n, where n i1,...,i k = N 1{X (j) 1 = i 1,..., X (j) k = i k }. j=1 The contingency table n has a Multinomial(N, p) distribution, where d = k d i and p is a point in the (d 1)-dimensional i probability simplex d 1 with coordinates p i1,...,i k = P r{(x 1,..., X k ) = (i 1,..., i k )} Then we have the likelihood function N! L(p) = n i1! n ik! p ni1,...,ik i 1,...,i k i 1,,i k 4 / 32
8 Latent Structure Let H be an unobservable latent variable, defined on the set [r] = {1,..., r}. In its most basic version, a.k.a. the naive Bayes model, the LC model postulates that, conditional on H, the variables X 1,..., X k are mutually independent. Naive Bayes Model p i1,...,i k = r p i1,...,i k,h h=1 = r p (h) 1 (i 1)p (h) 2 (i 2) p (h) (i k)λ h h=1 k where λ h = P r(h = h) and p (h) k (i k) = P r(x k = i k H = h). 5 / 32
9 The Likelihood (2-way Tables) 2-class naive Bayes model with two manifest variables: the cell probabilities are p ij = λ h α ih β jh h {1,2} the log-likelihood function is l(θ) = n ij log i,j h {1,2} λ h α ih β jh where h λ h = i α ih = j β jh = 1. 6 / 32
10 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32
11 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32
12 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32
13 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32
14 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32
15 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32
16 Surface of 2-level LC Model for 2 2 Table p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 9 / 32
17 Surface of 2-level LC Model for 2 2 Table λ 1 ( α11 β 11, α 11 β 21 α 21 β 11, α 21 β 21 p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 ) + λ 2 ( α12 β 12, α 12 β 22 α 22 β 12, α 22 β 22 ) 9 / 32
18 Surface of 2-level LC Model for 2 2 Table V = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } S = {λ 1 p + λ 2 q : p, q V, λ 1, λ 2 0, λ 1 + λ 2 = 1} 3 9 / 32
19 2 2 Table with 1 Binary Class is Identifiable What about the identifiability? p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 Definition Identifiability: the mapping f : (λ h, α ih, β jh ) (p 11, p 12, p 21, p 22 ) is locally one-to-one. 1 identifiability the symbolic rank of the Jacobian of f is full 2 a necessary condition is that the dimension of Image(f) equals to the expected dimension min{1 + 2 (2 1), 3} = / 32
20 3 3 Table with 1 Binary Class is Unidentifiable! p ij = λ h α ih β jh, i, j = 1, 2, 3. h {1,2} We can compute the polynomials vanishing on Image(f) where f : (λ, α, β) p: p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 p 33 It is the determinant of the 3 3 table. So Image(f) = {p = (p ij ) i,j=1,2,3 : det(p) = 0} Then we can compute the dimension of Image(f). The effective dimension is 7, less than the standard dimension [(3 1) 2] = 9 and the expected dimension / 32
21 Algebraic Tools In the language of algebraic geometry, Image(f) is a variety. Below is the code we use in the symbolic software SINGULAR to compute the polynomials defining Image(f). ring r=0, (p11,p12,p13,p21,p22,p23,p31,p32,p33,h1,h2,a11,a21,a31, a12,a22,a32,b11,b21,b31,b12,b22,b32), lp; ideal I=p11-h1*a11*b11-h2*a12*b12, p12-h1*a11*b21-h2*a12*b22, p13-h1*a11*b31-h2*a12*b32, p21-h1*a21*b11-h2*a22*b12, p22-h1*a21*b21-h2*a22*b22, p23-h1*a21*b31-h2*a22*b32, p31-h1*a31*b11-h2*a32*b12, p32-h1*a31*b21-h2*a32*b22, p33-h1*a31*b31-h2*a32*b32, h1+h2-1, a11+a21+a31-1, a12+a22+a32-1, b11+b21+b31-1, b12+b22+b32-1, p11+p12+p13+p21+p22+p23+p31+p32+p33-1; ideal J=elim1(I,h1*h2*a11*a21*a31*a12*a22*a32*b11*b21*b31*b12*b22*b32); 12 / 32
22 Identifiability of LC Models for General 2-way Tables Lemma 1 The probability matrix P I J for r-level latent class model has rank at most r. 2 The image variety Image(f) is defined by all the (r + 1) (r + 1) subdeterminants of P. Theorem (Effective Dimension of 2-way Table) The r-level latent class model for an I J table has effective dimension (I + J r)r 1 and therefore the dimension of the unidentifiable space for (λ h, α ih, β jh ) is SD ED = r(r 1). 13 / 32
23 Identifiability of LC Models for General 2-way Tables Lemma 1 The probability matrix P I J for r-level latent class model has rank at most r. 2 The image variety Image(f) is defined by all the (r + 1) (r + 1) subdeterminants of P. Theorem (Effective Dimension of 2-way Table) The r-level latent class model for an I J table has effective dimension (I + J r)r 1 and therefore the dimension of the unidentifiable space for (λ h, α ih, β jh ) is SD ED = r(r 1). 13 / 32
24 100 Swiss Francs Problem Here is the observed table: n = Now we want to fit a 2-level latent class model to the table. n ij log ( h λ hα ih β jh ) s.t. max λ,α,β i,h i,j h λ h = i α ih = j β jh = 1 OR max n ij log(p ij ) s.t. det(p p ij ) = 0 (rank(p) = 2) where p ij is the 3 3 submatrix of p obtained by erasing the ith row and the jth column. 14 / 32
25 100 Swiss Francs Problem Here is the observed table: n = Now we want to fit a 2-level latent class model to the table. n ij log ( h λ hα ih β jh ) s.t. max λ,α,β i,h i,j h λ h = i α ih = j β jh = 1 OR max n ij log(p ij ) s.t. det(p p ij ) = 0 (rank(p) = 2) where p ij is the 3 3 submatrix of p obtained by erasing the ith row and the jth column. 14 / 32
26 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32
27 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32
28 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32
29 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32
30 Maxima of The Log-likelihood Function /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/ /3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ / 32
31 The Shape of Likelihood Function Here is a profile likelihood for the parameters (α 11, α 21, α 31 ): 17 / 32
32 2-D Unidentifiable Subspaces For 7 Local Maxima We know the degree of deficiency in the parameter space is 2. Therefore for each MLE (or local maxima), there is a 2-dimensional subspace of the parameter space corresponding to it α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 6λ 1 1 = 0 18 / 32
33 2-D Unidentifiable Subspaces For 7 Local Maxima We know the degree of deficiency in the parameter space is 2. Therefore for each MLE (or local maxima), there is a 2-dimensional subspace of the parameter space corresponding to it α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 6λ 1 1 = 0 18 / 32
34 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32
35 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32
36 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32
37 Summary for 100 Swiss Francs Problem /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ observed MLE local maximum the idea generalizes to general 2-way square tables with symmetric data. - symmetric MLE and local maximum - multiple MLEs - averaging 20 / 32
38 Data From National Long Term Care Survey (NLTCS) 2 16 contingency table with data on 6 activities of daily living (ADL) and 10 instrumental activities of daily living (IADL) extracted by Erosheva from community-dwelling elderly from 1982, 1984, 1989, and 1994 survey waves. Of the 65, 536 cells in the table, 62, 384 (95.19%) contain zero counts, 1, 792 (2.64) contain 1, and 499 (0.76%) contain counts of 2. The largest cell count is Data analyzed in Erosheva, Fienberg, and Joutard (2007). ADL eating getting in/out of bed getting around inside dressing bathing getting to the bathroom IADL doing heavy housework doing light housework doing laundry cooking grocery shopping getting about outside travelling managing money taking medicine telephoning 21 / 32
39 Model Selection for NLTCS Extract BIC and log-likelihood values for various values of r. r Dimension Maximal log-likelihood BIC / 32
40 Fitted Values for the Largest Six Cells in NLTCS Extract. r Fitted values Observed / 32
41 Computational Approaches for LCM MLEs Expectation Maximization (EM) - hill climbing method converges steadily - but converges only linearly - time complexity for one single step is O(d r i d i) and space complexity is O(d r) Newton-Raphson Method - converges quadratically - tends to be very time and space intensive - both the time complexity and space complexity are O(d r 2 i d i). - numerically unstable if Hessian matrix is poorly conditioned Modified NR Approach - modify Hessian matices so they remain negative definite - then approximate log-likelihood locally by quadratic function - since log-likelihood is neither concave nor quadratic, these modifications don t necessarily guarantee increase of log-likelihood at each iteration step 24 / 32
42 Computational Approaches for LCM MLEs Expectation Maximization (EM) - hill climbing method converges steadily - but converges only linearly - time complexity for one single step is O(d r i d i) and space complexity is O(d r) Newton-Raphson Method - converges quadratically - tends to be very time and space intensive - both the time complexity and space complexity are O(d r 2 i d i). - numerically unstable if Hessian matrix is poorly conditioned Modified NR Approach - modify Hessian matices so they remain negative definite - then approximate log-likelihood locally by quadratic function - since log-likelihood is neither concave nor quadratic, these modifications don t necessarily guarantee increase of log-likelihood at each iteration step 24 / 32
43 Condition Numbers of Hessian Matrices Condition numbers of Hessian matrices at the maxima for the NLTCS data. r Condition number e e e e e e e e e e e e e e e e e e e / 32
44 Profile Likelihood For r=2 26 / 32
45 Summary Latent Class Models - existence and uniqueness of MLE - identifiability - model selection Computational tools - Singular - Expectation Maximization - Newton-Raphson method Examples Swiss Francs problem - sparse 2 16 table from the National Long Term Care Survey 27 / 32
46 Thank you! 28 / 32
47 References S.E. Fienberg, P. Hersh, A. Rinaldo, and Y. Zhou (2008). Maximum likelihood estimation in latent class models for contingency tables, In P. Gibilisco, Eva Riccomagno, Maria-Piera Rogantin (eds.) Algebraic and Geometric Methods in Probability and Statistics, Cambridge University Press, to appear. 29 / 32
48 Effective Dimension and Deficiency For a latent class model, p i1,...,i k = r h=1 λ h p (1) i 1h p(k) i k h, i k = 1,..., d k standard dimension: the dimension of the fully obervable model of conditional independence, which is r i (d i 1) + r 1. expected dimension: min {d 1, r i (d i 1) + r 1}, d = i d i is dimension of the table. effective dimension: the actual dimension of the model Definition (Deficiency) A latent class model is deficient if the effective dimension smaller than the expected dimension. Back to Page 10 Back to Page / 32
49 Definition (Variety) the zero set of a system of polynomials. it s indeed a hyper-surface. For example, the surface of independence we ve seen before. Definition (Polynomial ring) the set of polynomials in one or more variables with coefficients in a ring. For example, R[x], Q[x, y]. Definition (Ideal) The ideal I is a subset of a ring R satisfying f + g I if f I and g I, and pf I if f I and p R is an arbitrary element. For example, even number or multiple of 3 is an ideal of the integer ring, and {2} or {3} is called the generating set of the ideal. Definition (Ideal of Variety) the set of polynomials vanishing on the variety (hyper-surface). For example, the polynomial p 11 p 22 p 12 p 21 generates the ideal of the surface of independence. Back to Page / 32
50 Different Dimensions of Some Latent Class Models Effective Standard Complete Deficiency d 1 r / 32
Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data
Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,
More informationMaximum Likelihood Estimation in Latent Class Models For Contingency Table Data
Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,
More informationMaximum Likelihood Estimation in Latent Class Models For Contingency Table Data
Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,
More informationSolving the 100 Swiss Francs Problem
Solving the 00 Swiss Francs Problem Mingfu Zhu, Guangran Jiang and Shuhong Gao Abstract. Sturmfels offered 00 Swiss Francs in 00 to a conjecture, which deals with a special case of the maximum likelihood
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More information(II.B) Basis and dimension
(II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationN. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:
0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationMath 550 Notes. Chapter 2. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010
Math 550 Notes Chapter 2 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 2 Fall 2010 1 / 20 Linear algebra deals with finite dimensional
More informationDefinition 2.3. We define addition and multiplication of matrices as follows.
14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationHierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery
Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery E. M. AIROLDI, S. E. FIENBERG, C. JOUTARD, and T. M. LOVE Department of Statistics and FAS Center for Systems Biology, Harvard
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationDiscovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models
Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models Edoardo M. Airoldi, Stephen E. Fienberg, Cyrille Joutard, Tanzy M. Love May 9, 2006 CMU-ML-06-101 Discovering Latent Patterns
More informationAlgebraic Statistics progress report
Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More information33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM
33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM (UPDATED MARCH 17, 2018) The final exam will be cumulative, with a bit more weight on more recent material. This outline covers the what we ve done since the
More informationMath (P)refresher Lecture 8: Unconstrained Optimization
Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions
More informationON RESTRICTED ARITHMETIC PROGRESSIONS OVER FINITE FIELDS
ON RESTRICTED ARITHMETIC PROGRESSIONS OVER FINITE FIELDS BRIAN COOK ÁKOS MAGYAR Abstract. Let A be a subset of F n p, the n-dimensional linear space over the prime field F p of size at least δn (N = p
More informationCommutative algebra 19 May Jan Draisma TU Eindhoven and VU Amsterdam
1 Commutative algebra 19 May 2015 Jan Draisma TU Eindhoven and VU Amsterdam Goal: Jacobian criterion for regularity 2 Recall A Noetherian local ring R with maximal ideal m and residue field K := R/m is
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More informationLecture 1: Review of linear algebra
Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLecture Summaries for Linear Algebra M51A
These lecture summaries may also be viewed online by clicking the L icon at the top right of any lecture screen. Lecture Summaries for Linear Algebra M51A refers to the section in the textbook. Lecture
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMinimal-span bases, linear system theory, and the invariant factor theorem
Minimal-span bases, linear system theory, and the invariant factor theorem G. David Forney, Jr. MIT Cambridge MA 02139 USA DIMACS Workshop on Algebraic Coding Theory and Information Theory DIMACS Center,
More informationMath 121 Homework 4: Notes on Selected Problems
Math 121 Homework 4: Notes on Selected Problems 11.2.9. If W is a subspace of the vector space V stable under the linear transformation (i.e., (W ) W ), show that induces linear transformations W on W
More informationList of Symbols, Notations and Data
List of Symbols, Notations and Data, : Binomial distribution with trials and success probability ; 1,2, and 0, 1, : Uniform distribution on the interval,,, : Normal distribution with mean and variance,,,
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationLecture 9: Vector Algebra
Lecture 9: Vector Algebra Linear combination of vectors Geometric interpretation Interpreting as Matrix-Vector Multiplication Span of a set of vectors Vector Spaces and Subspaces Linearly Independent/Dependent
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationCalculation in the special cases n = 2 and n = 3:
9. The determinant The determinant is a function (with real numbers as values) which is defined for quadratic matrices. It allows to make conclusions about the rank and appears in diverse theorems and
More informationALGEBRAIC GEOMETRY I - FINAL PROJECT
ALGEBRAIC GEOMETRY I - FINAL PROJECT ADAM KAYE Abstract This paper begins with a description of the Schubert varieties of a Grassmannian variety Gr(k, n) over C Following the technique of Ryan [3] for
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationOutline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St
Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationMATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations
MATHEMATICS Subject Code: MA Course Structure Sections/Units Section A Section B Section C Linear Algebra Complex Analysis Real Analysis Topics Section D Section E Section F Section G Section H Section
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL
MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationProblems in Linear Algebra and Representation Theory
Problems in Linear Algebra and Representation Theory (Most of these were provided by Victor Ginzburg) The problems appearing below have varying level of difficulty. They are not listed in any specific
More informationALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ.
ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ. ANDREW SALCH 1. Hilbert s Nullstellensatz. The last lecture left off with the claim that, if J k[x 1,..., x n ] is an ideal, then
More information9. The determinant. Notation: Also: A matrix, det(a) = A IR determinant of A. Calculation in the special cases n = 2 and n = 3:
9. The determinant The determinant is a function (with real numbers as values) which is defined for square matrices. It allows to make conclusions about the rank and appears in diverse theorems and formulas.
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationData Mining and Matrices
Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationChapter 5: Integer Compositions and Partitions and Set Partitions
Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A Winter 2017 Prof. Tesler Ch. 5: Compositions and Partitions Math 184A / Winter 2017 1 / 32 5.1. Compositions A strict
More informationACI-matrices all of whose completions have the same rank
ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationChap 3. Linear Algebra
Chap 3. Linear Algebra Outlines 1. Introduction 2. Basis, Representation, and Orthonormalization 3. Linear Algebraic Equations 4. Similarity Transformation 5. Diagonal Form and Jordan Form 6. Functions
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationMATRICES AND ITS APPLICATIONS
MATRICES AND ITS Elementary transformations and elementary matrices Inverse using elementary transformations Rank of a matrix Normal form of a matrix Linear dependence and independence of vectors APPLICATIONS
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationOpen Problems in Algebraic Statistics
Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic
More informationMatrix Multiplication
228 hapter Three Maps etween Spaces IV2 Matrix Multiplication After representing addition and scalar multiplication of linear maps in the prior subsection, the natural next operation to consider is function
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationChapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)
Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationStatistical Process Control for Multivariate Categorical Processes
Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationWhat is Singular Learning Theory?
What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationA Practical Algorithm for Topic Modeling with Provable Guarantees
1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationSolving Homogeneous Systems with Sub-matrices
Pure Mathematical Sciences, Vol 7, 218, no 1, 11-18 HIKARI Ltd, wwwm-hikaricom https://doiorg/112988/pms218843 Solving Homogeneous Systems with Sub-matrices Massoud Malek Mathematics, California State
More informationhe Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation
he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization
More information(1) for all (2) for all and all
8. Linear mappings and matrices A mapping f from IR n to IR m is called linear if it fulfills the following two properties: (1) for all (2) for all and all Mappings of this sort appear frequently in the
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra
MTH6140 Linear Algebra II Notes 2 21st October 2010 2 Matrices You have certainly seen matrices before; indeed, we met some in the first chapter of the notes Here we revise matrix algebra, consider row
More informationHermite normal form: Computation and applications
Integer Points in Polyhedra Gennady Shmonin Hermite normal form: Computation and applications February 24, 2009 1 Uniqueness of Hermite normal form In the last lecture, we showed that if B is a rational
More informationParameter estimation in linear Gaussian covariance models
Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop
More information