Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data

Size: px
Start display at page:

Download "Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data"

Transcription

1 Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department, Cylab Carnegie Mellon University May 20, 2008 (Joint Work with P. Hersh, A. Rinaldo, and Yi Zhou) 1 / 32

2 Outline Latent Class Models - existence and uniqueness of MLE - identifiability - model selection Algebraic Geometry - the geometric description of LC models Examples Swiss Francs problem - sparse 2 16 table from the National Long Term Care Survey 2 / 32

3 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. 3 / 32

4 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32

5 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32

6 100 Swiss Francs Problem n = C A l(p) = 4 X i log p ii + 2 X i j log p ij Problem Version 1: Find the MLE for the 4 4 contingency table, n, under the latent class model with k = 2 classes. Problem Version 2: Find the 4 4 matrix p of rank 2 that is closest, in sense of maximum likelihood, to the empirical distribution 1 40 n. Is the following table an MLE? Can you prove it is a global maximum? B C A Is MLE unique? If not, can you suggest other MLEs? 3 / 32

7 Categorical Data and Contingency Tables Consider k categorical variables, X 1,..., X k, where each X i takes value on the finite set [d i ] {1,..., d i }. The cross-classification of N i.i.d. realizations of (X 1,..., X k ) produces a random integer-valued vector n, where n i1,...,i k = N 1{X (j) 1 = i 1,..., X (j) k = i k }. j=1 The contingency table n has a Multinomial(N, p) distribution, where d = k d i and p is a point in the (d 1)-dimensional i probability simplex d 1 with coordinates p i1,...,i k = P r{(x 1,..., X k ) = (i 1,..., i k )} Then we have the likelihood function N! L(p) = n i1! n ik! p ni1,...,ik i 1,...,i k i 1,,i k 4 / 32

8 Latent Structure Let H be an unobservable latent variable, defined on the set [r] = {1,..., r}. In its most basic version, a.k.a. the naive Bayes model, the LC model postulates that, conditional on H, the variables X 1,..., X k are mutually independent. Naive Bayes Model p i1,...,i k = r p i1,...,i k,h h=1 = r p (h) 1 (i 1)p (h) 2 (i 2) p (h) (i k)λ h h=1 k where λ h = P r(h = h) and p (h) k (i k) = P r(x k = i k H = h). 5 / 32

9 The Likelihood (2-way Tables) 2-class naive Bayes model with two manifest variables: the cell probabilities are p ij = λ h α ih β jh h {1,2} the log-likelihood function is l(θ) = n ij log i,j h {1,2} λ h α ih β jh where h λ h = i α ih = j β jh = 1. 6 / 32

10 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32

11 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32

12 Issues for Estimation and Testing 1 Maximum likelihood estimation not in exponential family, no theory for the existence and uniqueness of MLE no summary statistics, the minimum sufficient statistics are the data themselves maxima of likelihood computed by Newton-Raphson or EM algorithm are local maxima the MLE for p = (p ij ) is not unique (multimodality) there can be infinitely many MLEs for (λ h, α ih, β jh ) (unidentifiability) 2 Goodness-of-fit test because the model may be unidentifiable w.r.t. (λ h, α ih, β jh ), computing the effective dimension is an issue 7 / 32

13 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32

14 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32

15 Geometric Derivation of LC Models (2-way Tables) 2 2 table p 21 p 22 ( p11 p 12 ) p 1+ p 2+ p +1 p +2 p ++ model of independence: p ij = p i+ p +j = α i β j now let s consider the polynomial map: f : (α 1, α 2, β 1, β 2 ) ( ) α1 β 1, α 1 β 2 = α 2 β 1, α 2 β 2 where d 1 = {x R d : d i=1 x i = 1, x i 0}. with some computation, we get the image of the mapping: ( ) p11 p 12 p 21 p 22 Image(f) = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } 8 / 32

16 Surface of 2-level LC Model for 2 2 Table p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 9 / 32

17 Surface of 2-level LC Model for 2 2 Table λ 1 ( α11 β 11, α 11 β 21 α 21 β 11, α 21 β 21 p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 ) + λ 2 ( α12 β 12, α 12 β 22 α 22 β 12, α 22 β 22 ) 9 / 32

18 Surface of 2-level LC Model for 2 2 Table V = {(p 11, p 12, p 21, p 22 ) : p 11 p 22 = p 12 p 21 } S = {λ 1 p + λ 2 q : p, q V, λ 1, λ 2 0, λ 1 + λ 2 = 1} 3 9 / 32

19 2 2 Table with 1 Binary Class is Identifiable What about the identifiability? p ij = λ 1 α i1 β j1 + λ 2 α i2 β j2 Definition Identifiability: the mapping f : (λ h, α ih, β jh ) (p 11, p 12, p 21, p 22 ) is locally one-to-one. 1 identifiability the symbolic rank of the Jacobian of f is full 2 a necessary condition is that the dimension of Image(f) equals to the expected dimension min{1 + 2 (2 1), 3} = / 32

20 3 3 Table with 1 Binary Class is Unidentifiable! p ij = λ h α ih β jh, i, j = 1, 2, 3. h {1,2} We can compute the polynomials vanishing on Image(f) where f : (λ, α, β) p: p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 p 33 It is the determinant of the 3 3 table. So Image(f) = {p = (p ij ) i,j=1,2,3 : det(p) = 0} Then we can compute the dimension of Image(f). The effective dimension is 7, less than the standard dimension [(3 1) 2] = 9 and the expected dimension / 32

21 Algebraic Tools In the language of algebraic geometry, Image(f) is a variety. Below is the code we use in the symbolic software SINGULAR to compute the polynomials defining Image(f). ring r=0, (p11,p12,p13,p21,p22,p23,p31,p32,p33,h1,h2,a11,a21,a31, a12,a22,a32,b11,b21,b31,b12,b22,b32), lp; ideal I=p11-h1*a11*b11-h2*a12*b12, p12-h1*a11*b21-h2*a12*b22, p13-h1*a11*b31-h2*a12*b32, p21-h1*a21*b11-h2*a22*b12, p22-h1*a21*b21-h2*a22*b22, p23-h1*a21*b31-h2*a22*b32, p31-h1*a31*b11-h2*a32*b12, p32-h1*a31*b21-h2*a32*b22, p33-h1*a31*b31-h2*a32*b32, h1+h2-1, a11+a21+a31-1, a12+a22+a32-1, b11+b21+b31-1, b12+b22+b32-1, p11+p12+p13+p21+p22+p23+p31+p32+p33-1; ideal J=elim1(I,h1*h2*a11*a21*a31*a12*a22*a32*b11*b21*b31*b12*b22*b32); 12 / 32

22 Identifiability of LC Models for General 2-way Tables Lemma 1 The probability matrix P I J for r-level latent class model has rank at most r. 2 The image variety Image(f) is defined by all the (r + 1) (r + 1) subdeterminants of P. Theorem (Effective Dimension of 2-way Table) The r-level latent class model for an I J table has effective dimension (I + J r)r 1 and therefore the dimension of the unidentifiable space for (λ h, α ih, β jh ) is SD ED = r(r 1). 13 / 32

23 Identifiability of LC Models for General 2-way Tables Lemma 1 The probability matrix P I J for r-level latent class model has rank at most r. 2 The image variety Image(f) is defined by all the (r + 1) (r + 1) subdeterminants of P. Theorem (Effective Dimension of 2-way Table) The r-level latent class model for an I J table has effective dimension (I + J r)r 1 and therefore the dimension of the unidentifiable space for (λ h, α ih, β jh ) is SD ED = r(r 1). 13 / 32

24 100 Swiss Francs Problem Here is the observed table: n = Now we want to fit a 2-level latent class model to the table. n ij log ( h λ hα ih β jh ) s.t. max λ,α,β i,h i,j h λ h = i α ih = j β jh = 1 OR max n ij log(p ij ) s.t. det(p p ij ) = 0 (rank(p) = 2) where p ij is the 3 3 submatrix of p obtained by erasing the ith row and the jth column. 14 / 32

25 100 Swiss Francs Problem Here is the observed table: n = Now we want to fit a 2-level latent class model to the table. n ij log ( h λ hα ih β jh ) s.t. max λ,α,β i,h i,j h λ h = i α ih = j β jh = 1 OR max n ij log(p ij ) s.t. det(p p ij ) = 0 (rank(p) = 2) where p ij is the 3 3 submatrix of p obtained by erasing the ith row and the jth column. 14 / 32

26 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32

27 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32

28 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32

29 How Many Maxima Are There? At the outset, I told you about the solution: But there are actually 7 maxima, one which this is one! We found them by repeated use of EM. How many of them are global? 15 / 32

30 Maxima of The Log-likelihood Function /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/ /3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ / 32

31 The Shape of Likelihood Function Here is a profile likelihood for the parameters (α 11, α 21, α 31 ): 17 / 32

32 2-D Unidentifiable Subspaces For 7 Local Maxima We know the degree of deficiency in the parameter space is 2. Therefore for each MLE (or local maxima), there is a 2-dimensional subspace of the parameter space corresponding to it α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 6λ 1 1 = 0 18 / 32

33 2-D Unidentifiable Subspaces For 7 Local Maxima We know the degree of deficiency in the parameter space is 2. Therefore for each MLE (or local maxima), there is a 2-dimensional subspace of the parameter space corresponding to it α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 6λ 1 1 = 0 18 / 32

34 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32

35 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32

36 2-Dim Unidentifiable Subspaces For 7 Local Maxima /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/ /3 8/3 2 8/3 1 C A 1 C A 0 0 8/3 2 8/3 8/ /3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ C A 1 C A 80α 11 β 11 λ 1 20α 11 λ 1 20β 11 λ 1 + 8λ 1 3 = 0 240α 11 β 11 λ 1 60α 11 λ 1 60β 11 λ λ 1 1 = 0 19 / 32

37 Summary for 100 Swiss Francs Problem /3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/ observed MLE local maximum the idea generalizes to general 2-way square tables with symmetric data. - symmetric MLE and local maximum - multiple MLEs - averaging 20 / 32

38 Data From National Long Term Care Survey (NLTCS) 2 16 contingency table with data on 6 activities of daily living (ADL) and 10 instrumental activities of daily living (IADL) extracted by Erosheva from community-dwelling elderly from 1982, 1984, 1989, and 1994 survey waves. Of the 65, 536 cells in the table, 62, 384 (95.19%) contain zero counts, 1, 792 (2.64) contain 1, and 499 (0.76%) contain counts of 2. The largest cell count is Data analyzed in Erosheva, Fienberg, and Joutard (2007). ADL eating getting in/out of bed getting around inside dressing bathing getting to the bathroom IADL doing heavy housework doing light housework doing laundry cooking grocery shopping getting about outside travelling managing money taking medicine telephoning 21 / 32

39 Model Selection for NLTCS Extract BIC and log-likelihood values for various values of r. r Dimension Maximal log-likelihood BIC / 32

40 Fitted Values for the Largest Six Cells in NLTCS Extract. r Fitted values Observed / 32

41 Computational Approaches for LCM MLEs Expectation Maximization (EM) - hill climbing method converges steadily - but converges only linearly - time complexity for one single step is O(d r i d i) and space complexity is O(d r) Newton-Raphson Method - converges quadratically - tends to be very time and space intensive - both the time complexity and space complexity are O(d r 2 i d i). - numerically unstable if Hessian matrix is poorly conditioned Modified NR Approach - modify Hessian matices so they remain negative definite - then approximate log-likelihood locally by quadratic function - since log-likelihood is neither concave nor quadratic, these modifications don t necessarily guarantee increase of log-likelihood at each iteration step 24 / 32

42 Computational Approaches for LCM MLEs Expectation Maximization (EM) - hill climbing method converges steadily - but converges only linearly - time complexity for one single step is O(d r i d i) and space complexity is O(d r) Newton-Raphson Method - converges quadratically - tends to be very time and space intensive - both the time complexity and space complexity are O(d r 2 i d i). - numerically unstable if Hessian matrix is poorly conditioned Modified NR Approach - modify Hessian matices so they remain negative definite - then approximate log-likelihood locally by quadratic function - since log-likelihood is neither concave nor quadratic, these modifications don t necessarily guarantee increase of log-likelihood at each iteration step 24 / 32

43 Condition Numbers of Hessian Matrices Condition numbers of Hessian matrices at the maxima for the NLTCS data. r Condition number e e e e e e e e e e e e e e e e e e e / 32

44 Profile Likelihood For r=2 26 / 32

45 Summary Latent Class Models - existence and uniqueness of MLE - identifiability - model selection Computational tools - Singular - Expectation Maximization - Newton-Raphson method Examples Swiss Francs problem - sparse 2 16 table from the National Long Term Care Survey 27 / 32

46 Thank you! 28 / 32

47 References S.E. Fienberg, P. Hersh, A. Rinaldo, and Y. Zhou (2008). Maximum likelihood estimation in latent class models for contingency tables, In P. Gibilisco, Eva Riccomagno, Maria-Piera Rogantin (eds.) Algebraic and Geometric Methods in Probability and Statistics, Cambridge University Press, to appear. 29 / 32

48 Effective Dimension and Deficiency For a latent class model, p i1,...,i k = r h=1 λ h p (1) i 1h p(k) i k h, i k = 1,..., d k standard dimension: the dimension of the fully obervable model of conditional independence, which is r i (d i 1) + r 1. expected dimension: min {d 1, r i (d i 1) + r 1}, d = i d i is dimension of the table. effective dimension: the actual dimension of the model Definition (Deficiency) A latent class model is deficient if the effective dimension smaller than the expected dimension. Back to Page 10 Back to Page / 32

49 Definition (Variety) the zero set of a system of polynomials. it s indeed a hyper-surface. For example, the surface of independence we ve seen before. Definition (Polynomial ring) the set of polynomials in one or more variables with coefficients in a ring. For example, R[x], Q[x, y]. Definition (Ideal) The ideal I is a subset of a ring R satisfying f + g I if f I and g I, and pf I if f I and p R is an arbitrary element. For example, even number or multiple of 3 is an ideal of the integer ring, and {2} or {3} is called the generating set of the ideal. Definition (Ideal of Variety) the set of polynomials vanishing on the variety (hyper-surface). For example, the polynomial p 11 p 22 p 12 p 21 generates the ideal of the surface of independence. Back to Page / 32

50 Different Dimensions of Some Latent Class Models Effective Standard Complete Deficiency d 1 r / 32

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,

More information

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,

More information

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data Stephen E. Fienberg Department of Statistics, Machine Learning Department and Cylab Carnegie Mellon University Pittsburgh,

More information

Solving the 100 Swiss Francs Problem

Solving the 100 Swiss Francs Problem Solving the 00 Swiss Francs Problem Mingfu Zhu, Guangran Jiang and Shuhong Gao Abstract. Sturmfels offered 00 Swiss Francs in 00 to a conjecture, which deals with a special case of the maximum likelihood

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

(II.B) Basis and dimension

(II.B) Basis and dimension (II.B) Basis and dimension How would you explain that a plane has two dimensions? Well, you can go in two independent directions, and no more. To make this idea precise, we formulate the DEFINITION 1.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form: 0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Math 550 Notes. Chapter 2. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

Math 550 Notes. Chapter 2. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010 Math 550 Notes Chapter 2 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 2 Fall 2010 1 / 20 Linear algebra deals with finite dimensional

More information

Definition 2.3. We define addition and multiplication of matrices as follows.

Definition 2.3. We define addition and multiplication of matrices as follows. 14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery

Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery E. M. AIROLDI, S. E. FIENBERG, C. JOUTARD, and T. M. LOVE Department of Statistics and FAS Center for Systems Biology, Harvard

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models

Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models Edoardo M. Airoldi, Stephen E. Fienberg, Cyrille Joutard, Tanzy M. Love May 9, 2006 CMU-ML-06-101 Discovering Latent Patterns

More information

Algebraic Statistics progress report

Algebraic Statistics progress report Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM 33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM (UPDATED MARCH 17, 2018) The final exam will be cumulative, with a bit more weight on more recent material. This outline covers the what we ve done since the

More information

Math (P)refresher Lecture 8: Unconstrained Optimization

Math (P)refresher Lecture 8: Unconstrained Optimization Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions

More information

ON RESTRICTED ARITHMETIC PROGRESSIONS OVER FINITE FIELDS

ON RESTRICTED ARITHMETIC PROGRESSIONS OVER FINITE FIELDS ON RESTRICTED ARITHMETIC PROGRESSIONS OVER FINITE FIELDS BRIAN COOK ÁKOS MAGYAR Abstract. Let A be a subset of F n p, the n-dimensional linear space over the prime field F p of size at least δn (N = p

More information

Commutative algebra 19 May Jan Draisma TU Eindhoven and VU Amsterdam

Commutative algebra 19 May Jan Draisma TU Eindhoven and VU Amsterdam 1 Commutative algebra 19 May 2015 Jan Draisma TU Eindhoven and VU Amsterdam Goal: Jacobian criterion for regularity 2 Recall A Noetherian local ring R with maximal ideal m and residue field K := R/m is

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Lecture 1: Review of linear algebra

Lecture 1: Review of linear algebra Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Lecture Summaries for Linear Algebra M51A

Lecture Summaries for Linear Algebra M51A These lecture summaries may also be viewed online by clicking the L icon at the top right of any lecture screen. Lecture Summaries for Linear Algebra M51A refers to the section in the textbook. Lecture

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Minimal-span bases, linear system theory, and the invariant factor theorem

Minimal-span bases, linear system theory, and the invariant factor theorem Minimal-span bases, linear system theory, and the invariant factor theorem G. David Forney, Jr. MIT Cambridge MA 02139 USA DIMACS Workshop on Algebraic Coding Theory and Information Theory DIMACS Center,

More information

Math 121 Homework 4: Notes on Selected Problems

Math 121 Homework 4: Notes on Selected Problems Math 121 Homework 4: Notes on Selected Problems 11.2.9. If W is a subspace of the vector space V stable under the linear transformation (i.e., (W ) W ), show that induces linear transformations W on W

More information

List of Symbols, Notations and Data

List of Symbols, Notations and Data List of Symbols, Notations and Data, : Binomial distribution with trials and success probability ; 1,2, and 0, 1, : Uniform distribution on the interval,,, : Normal distribution with mean and variance,,,

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Lecture 9: Vector Algebra

Lecture 9: Vector Algebra Lecture 9: Vector Algebra Linear combination of vectors Geometric interpretation Interpreting as Matrix-Vector Multiplication Span of a set of vectors Vector Spaces and Subspaces Linearly Independent/Dependent

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Calculation in the special cases n = 2 and n = 3:

Calculation in the special cases n = 2 and n = 3: 9. The determinant The determinant is a function (with real numbers as values) which is defined for quadratic matrices. It allows to make conclusions about the rank and appears in diverse theorems and

More information

ALGEBRAIC GEOMETRY I - FINAL PROJECT

ALGEBRAIC GEOMETRY I - FINAL PROJECT ALGEBRAIC GEOMETRY I - FINAL PROJECT ADAM KAYE Abstract This paper begins with a description of the Schubert varieties of a Grassmannian variety Gr(k, n) over C Following the technique of Ryan [3] for

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

MATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations

MATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations MATHEMATICS Subject Code: MA Course Structure Sections/Units Section A Section B Section C Linear Algebra Complex Analysis Real Analysis Topics Section D Section E Section F Section G Section H Section

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Problems in Linear Algebra and Representation Theory

Problems in Linear Algebra and Representation Theory Problems in Linear Algebra and Representation Theory (Most of these were provided by Victor Ginzburg) The problems appearing below have varying level of difficulty. They are not listed in any specific

More information

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ.

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ. ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 2: HILBERT S NULLSTELLENSATZ. ANDREW SALCH 1. Hilbert s Nullstellensatz. The last lecture left off with the claim that, if J k[x 1,..., x n ] is an ideal, then

More information

9. The determinant. Notation: Also: A matrix, det(a) = A IR determinant of A. Calculation in the special cases n = 2 and n = 3:

9. The determinant. Notation: Also: A matrix, det(a) = A IR determinant of A. Calculation in the special cases n = 2 and n = 3: 9. The determinant The determinant is a function (with real numbers as values) which is defined for square matrices. It allows to make conclusions about the rank and appears in diverse theorems and formulas.

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Chapter 5: Integer Compositions and Partitions and Set Partitions

Chapter 5: Integer Compositions and Partitions and Set Partitions Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A Winter 2017 Prof. Tesler Ch. 5: Compositions and Partitions Math 184A / Winter 2017 1 / 32 5.1. Compositions A strict

More information

ACI-matrices all of whose completions have the same rank

ACI-matrices all of whose completions have the same rank ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices

More information

Phasing via the Expectation Maximization (EM) Algorithm

Phasing via the Expectation Maximization (EM) Algorithm Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Chap 3. Linear Algebra

Chap 3. Linear Algebra Chap 3. Linear Algebra Outlines 1. Introduction 2. Basis, Representation, and Orthonormalization 3. Linear Algebraic Equations 4. Similarity Transformation 5. Diagonal Form and Jordan Form 6. Functions

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

MATRICES AND ITS APPLICATIONS

MATRICES AND ITS APPLICATIONS MATRICES AND ITS Elementary transformations and elementary matrices Inverse using elementary transformations Rank of a matrix Normal form of a matrix Linear dependence and independence of vectors APPLICATIONS

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Open Problems in Algebraic Statistics

Open Problems in Algebraic Statistics Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic

More information

Matrix Multiplication

Matrix Multiplication 228 hapter Three Maps etween Spaces IV2 Matrix Multiplication After representing addition and scalar multiplication of linear maps in the prior subsection, the natural next operation to consider is function

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1) Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Statistical Process Control for Multivariate Categorical Processes

Statistical Process Control for Multivariate Categorical Processes Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

What is Singular Learning Theory?

What is Singular Learning Theory? What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Practical Algorithm for Topic Modeling with Provable Guarantees 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December

More information

A minimalist s exposition of EM

A minimalist s exposition of EM A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability

More information

Solving Homogeneous Systems with Sub-matrices

Solving Homogeneous Systems with Sub-matrices Pure Mathematical Sciences, Vol 7, 218, no 1, 11-18 HIKARI Ltd, wwwm-hikaricom https://doiorg/112988/pms218843 Solving Homogeneous Systems with Sub-matrices Massoud Malek Mathematics, California State

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

(1) for all (2) for all and all

(1) for all (2) for all and all 8. Linear mappings and matrices A mapping f from IR n to IR m is called linear if it fulfills the following two properties: (1) for all (2) for all and all Mappings of this sort appear frequently in the

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra MTH6140 Linear Algebra II Notes 2 21st October 2010 2 Matrices You have certainly seen matrices before; indeed, we met some in the first chapter of the notes Here we revise matrix algebra, consider row

More information

Hermite normal form: Computation and applications

Hermite normal form: Computation and applications Integer Points in Polyhedra Gennady Shmonin Hermite normal form: Computation and applications February 24, 2009 1 Uniqueness of Hermite normal form In the last lecture, we showed that if B is a rational

More information

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop

More information