Maximum likelihood in log-linear models

Size: px
Start display at page:

Download "Maximum likelihood in log-linear models"

Transcription

1 Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010

2 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets of V. A density f (or function) factorizes w.r.t. A if there exist functions ψ a (x) which depend on x a only and f (x) = a A ψ a (x). Similar to factorization w.r.t. graph, but A are not necessarily complete subsets of a graph. The set of distributions P A which factorize w.r.t. A is the hierarchical log linear model generated by A. To avoid redundancy, it is common to assume the sets in A to be incomparable in the sense that no subset in A is contained in any other member of A. A is the generating class of the log linear model.

3 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs For any generating class A we construct the dependence graph G(A) = G(P A ) of the log linear model P A. Since the pairwise Markov property has to hold for all members of P A, it has at least to hold for all positive members. The dependence graph is determined by the relation α β a A : α, β a. For sets in A are clearly complete in G(A) and therefore distributions in P A do factorize according to G(A). On the other hand, any graph with fewer edges would not suffice. They are thus also global, local, and pairwise Markov w.r.t. G(A).

4 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs As a generating class defines a dependence graph G(A), the reverse is also true. The set C(G) of cliques (maximal complete subsets) of G is a generating class for the log linear model of distributions which factorize w.r.t. G. If the dependence graph completely summarizes the restrictions imposed by A, i.e. if A = C(G(A)), A is conformal.

5 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs φ IJ φ IK I K J The factor graph of A is the bipartite graph with vertices V A and edges define by φ JK α a α a. Using this graph even non-conformal log linear models admit a simple visual representation.

6 Data in list form Log linear models Consider a sample X 1 = x 1,..., X n = x n from a distribution with probability mass function p. We refer to such data as being in list form, e.g. as case Admitted Sex 1 Yes Male 2 Yes Female 3 No Male 4 Yes Male...

7 Contingency Table Log linear models Data often presented in the form of a contingency table or cross-classification, obtained from the list by sorting according to category: Sex Admitted Male Female Yes No The numerical entries are cell counts n(x) = {ν : x ν = x} and the total number of observations is n = x X n (x).

8 Assume now p P A but otherwise unknown. The likelihood function can be expressed as L(p) = n ν=1 p(x ν ) = x X p(x) n(x). In contingency table form the data follow a multinomial distribution n! P{N(x) = n(x), x X } = x X n(x)! x X p(x) n(x) but this only affects the likelihood function by a constant factor.

9 The likelihood function L(p) = x X p(x) n(x), is continuous as a function of the ( X -dimensional vector) unknown probability distribution p. Since the closure P A is compact (bounded and closed), L attains its maximum on P A. Unfortunately, P A is not closed by itself so limits of factorizing distributions do not necessarily factorize. The maximum of the likelihood function may not necessarily on P A itself, so it is necessary in general to include the boundary points.

10 Indeed, it is also true that L has a unique maximum over P A, which we shall now show. For simplicity, we only establish uniqueness within P A. The proof is indirect, but quite simple. Assume p 1, p 2 P A with p 1 p 2 and L(p 1 ) = L(p 2 ) = sup p P A L(p). (1) Define p 12 (x) = c p 1 (x)p 2 (x), where c 1 = { x p1 (x)p 2 (x)} is a normalizing constant.

11 Then p 12 P A because p 12 (x) = c p 1 (x)p 2 (x) = c ψa(x)ψ 1 a(x) 2 = ψa 12 (x), a A a A where e.g. ψa 12 = c 1/ A ψa(x)ψ 1 a(x). 2 The Cauchy Schwarz inequality yields c 1 = x i.e. we have c > 1. p1 (x)p 2 (x) < x p 1 (x) p 2 (x) = 1 x

12 Hence L(p 12 ) = x = x p 12 (x) n(x) { c{ } n(x) p 1 (x)p 2 (x) = c n x p1 (x) n(x) p2 (x) n(x) = c n L(p 1 )L(p 2 ) > L(p 1 )L(p 2 ) = L(p 1 ) = L(p 2 ), which contradicts (1). Hence we conclude p 1 = p 2. The extension to P A is almost identical. It just needs a limit argument to establish p 1, p 2 P A p 12 P A. x

13 The maximum likelihood estimate ˆp of p is the unique element of P A which satisfies the system of equations nˆp(x a ) = n(x a ), a A, x a X a. (2) Here g(x a ) = y:y a=x a g(y) is the a-marginal of the function g. The system of equations (2) expresses the fitting of the marginals in A. It can be seen as an instance of the fact that in an exponential family (log-linear exponential), the MLE is found by equating the sufficient statistics (marginal counts) to their expectation.

14 Proof: Assume p P A is a solution to the equations (2). That p maximizes the likelihood function follows from the calculation below, where p P A is arbitrary and φ a = log ψ a : log L(p) = n(x) log p(x) = n(x) φ a (x) x X x X a A = n(x)φ a (x) a A x X = n(y)φ a (y) a A x a X a y:y a=x a = n(x a )φ a (x), a A x a X a as n(x a ) = y:y a=x a n(y).

15 Further we get log L(p) = n(x a )φ a (x) a A x a X a = np (x a )φ a (x) a A x a X a = np (x)φ a (x) a A x X = np (x) log p(x). x X Thus, for any p P A we have established that log L(p) = x X np (x) log p(x).

16 This is in particular also true for p. The information inequality now yields log L(p) = x X np (x) log p(x) x X np (x) log p (x) = log L(p ). The case of p P A needs an additional limit argument.

17 To show that the equations (2) indeed have a solution, we simply describe a convergent algorithm which solves it. This cycles (repeatedly) through all the a-marginals in A and fit them one by one. For a A define the following scaling operation on p: (T a p)(x) p(x) n(x a) np(x a ), x X where 0/0 = 0 and b/0 is undefined if b 0.

18 Fitting the marginals Log linear models The operation T a fits the a-marginal if p(x a ) > 0 when n(x a ) > 0: n(t a p)(x a ) = n p(y) n(y a) np(y y:y a ) a=x a = n n(x a) p(y) np(x a ) y:y a=x a = n n(x a) np(x a ) p(x a) = n(x a ). Consequently, we have T 2 a = T a. No reason to do it twice.

19 Make an ordering of the generators A = {a 1,..., a k }. Define S by a full cycle of scalings Define the iteration It then holds that Sp = T ak T a2 T a1. p 0 (x) 1/ X, p n = Sp n 1, n = 1,.... lim p n = ˆp n where ˆp is the unique maximum likelihood estimate of p P A, i.e. the solution of the equation system (2).

20 Known as the IPS-algorithm or IPF-algorithm, or as a variety of other names. Implemented e.g. (inefficiently) in R in loglin with front end loglm in MASS. Key elements in proof: 1. If p P A, so is T a p; 2. T a is continuous at any point p of P A with p(x a ) 0 whenever n(x a ) = 0; 3. L(T a p) L(p) so likelihood always increases; 4. ˆp is the unique fixpoint for T (and S); 5. P A is compact.

21 A simple example Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal Admissions data from Berkeley. Consider A S, corresponding to A = {{A}, {S}}. We should fit A-marginal and S-marginal iteratively.

22 Initial values Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal Entries all equal to 4526/4. Gives initial values of np 0.

23 Fitting S-marginal Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal For example and so on =

24 Fitting A-marginal Log linear models For example Admitted Sex Yes No S-marginal Male Female A-marginal = and so on. Algorithm has converged, as both marginals now fit!

25 Normalised to probabilities Admitted Sex Yes No S-marginal Male Female A-marginal Dividing everything by 4526 yields ˆp. It is overkill to use the IPS algorithm as there is an explicit formula, as we shall see next time.

26 L attains its maximum on P A.

27 L attains its maximum on P A. The maximizer ˆp P A of L is unique.

28 L attains its maximum on P A. The maximizer ˆp P A of L is unique. The maximizer ˆp is determined as the unique solution within P A of the equations nˆp(x a ) = n(x a ), a A, x a X a.

29 L attains its maximum on P A. The maximizer ˆp P A of L is unique. The maximizer ˆp is determined as the unique solution within P A of the equations nˆp(x a ) = n(x a ), a A, x a X a. The maximizer is the limit of the convergent repeated fitting of marginals T a : p(x) p(x)n(x a )/{np(x a )}, x X, a A.

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Markov properties for undirected graphs

Markov properties for undirected graphs Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

EE512 Graphical Models Fall 2009

EE512 Graphical Models Fall 2009 EE512 Graphical Models Fall 2009 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2009 http://ssli.ee.washington.edu/~bilmes/ee512fa09 Lecture 3 -

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

Probability Propagation

Probability Propagation Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

More information

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1 Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Solutions to Problem Set 5

Solutions to Problem Set 5 UC Berkeley, CS 74: Combinatorics and Discrete Probability (Fall 00 Solutions to Problem Set (MU 60 A family of subsets F of {,,, n} is called an antichain if there is no pair of sets A and B in F satisfying

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X. Chapter 6 Completeness Lecture 18 Recall from Definition 2.22 that a Cauchy sequence in (X, d) is a sequence whose terms get closer and closer together, without any limit being specified. In the Euclidean

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Decomposable and Directed Graphical Gaussian Models

Decomposable and Directed Graphical Gaussian Models Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

M17 MAT25-21 HOMEWORK 6

M17 MAT25-21 HOMEWORK 6 M17 MAT25-21 HOMEWORK 6 DUE 10:00AM WEDNESDAY SEPTEMBER 13TH 1. To Hand In Double Series. The exercises in this section will guide you to complete the proof of the following theorem: Theorem 1: Absolute

More information

u xx + u yy = 0. (5.1)

u xx + u yy = 0. (5.1) Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function

More information

Module 7 : Applications of Integration - I. Lecture 20 : Definition of the power function and logarithmic function with positive base [Section 20.

Module 7 : Applications of Integration - I. Lecture 20 : Definition of the power function and logarithmic function with positive base [Section 20. Module 7 : Applications of Integration - I Lecture 20 : Definition of the power function and logarithmic function with positive base [Section 201] Objectives In this section you will learn the following

More information

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets Convergence in Metric Spaces Functional Analysis Lecture 3: Convergence and Continuity in Metric Spaces Bengt Ove Turesson September 4, 2016 Suppose that (X, d) is a metric space. A sequence (x n ) X is

More information

i c Robert C. Gunning

i c Robert C. Gunning c Robert C. Gunning i ii MATHEMATICS 218: NOTES Robert C. Gunning January 27, 2010 ii Introduction These are notes of honors courses on calculus of several variables given at Princeton University during

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Functional Analysis HW #3

Functional Analysis HW #3 Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

RIEMANN MAPPING THEOREM

RIEMANN MAPPING THEOREM RIEMANN MAPPING THEOREM VED V. DATAR Recall that two domains are called conformally equivalent if there exists a holomorphic bijection from one to the other. This automatically implies that there is an

More information

LECTURE Itineraries We start with a simple example of a dynamical system obtained by iterating the quadratic polynomial

LECTURE Itineraries We start with a simple example of a dynamical system obtained by iterating the quadratic polynomial LECTURE. Itineraries We start with a simple example of a dynamical system obtained by iterating the quadratic polynomial f λ : R R x λx( x), where λ [, 4). Starting with the critical point x 0 := /2, we

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Data Mining 2018 Bayesian Networks (1)

Data Mining 2018 Bayesian Networks (1) Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10

More information

Homework I, Solutions

Homework I, Solutions Homework I, Solutions I: (15 points) Exercise on lower semi-continuity: Let X be a normed space and f : X R be a function. We say that f is lower semi - continuous at x 0 if for every ε > 0 there exists

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

3.8 Strong valid inequalities

3.8 Strong valid inequalities 3.8 Strong valid inequalities By studying the problem structure, we can derive strong valid inequalities which lead to better approximations of the ideal formulation conv(x ) and hence to tighter bounds.

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Probability Propagation

Probability Propagation Graphical Models, Lecture 12, Michaelmas Term 2010 November 19, 2010 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii)

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

Math 61CM - Solutions to homework 6

Math 61CM - Solutions to homework 6 Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric

More information

Statistical Process Control for Multivariate Categorical Processes

Statistical Process Control for Multivariate Categorical Processes Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Cutting planes from extended LP formulations

Cutting planes from extended LP formulations Cutting planes from extended LP formulations Merve Bodur University of Wisconsin-Madison mbodur@wisc.edu Sanjeeb Dash IBM Research sanjeebd@us.ibm.com March 7, 2016 Oktay Günlük IBM Research gunluk@us.ibm.com

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Midterm 1. Every element of the set of functions is continuous

Midterm 1. Every element of the set of functions is continuous Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions

More information

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3. M311 Functions of Several Variables 2006 CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3. Differentiability 1 2 CHAPTER 1. Continuity If (a, b) R 2 then we write

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Multivariable Calculus

Multivariable Calculus 2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)

More information

Accumulation constants of iterated function systems with Bloch target domains

Accumulation constants of iterated function systems with Bloch target domains Accumulation constants of iterated function systems with Bloch target domains September 29, 2005 1 Introduction Linda Keen and Nikola Lakic 1 Suppose that we are given a random sequence of holomorphic

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

Solution of the 8 th Homework

Solution of the 8 th Homework Solution of the 8 th Homework Sangchul Lee December 8, 2014 1 Preinary 1.1 A simple remark on continuity The following is a very simple and trivial observation. But still this saves a lot of words in actual

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic Introduction to EF-games Inexpressivity results for first-order logic Normal forms for first-order logic Algorithms and complexity for specific classes of structures General complexity bounds Preliminaries

More information

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u) CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) Final Review Session 03/20/17 1. Let G = (V, E) be an unweighted, undirected graph. Let λ 1 be the maximum eigenvalue

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Lecture 4: State Estimation in Hidden Markov Models (cont.) EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

First In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018

First In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018 First In-Class Exam Solutions Math 40, Professor David Levermore Monday, October 208. [0] Let {b k } k N be a sequence in R and let A be a subset of R. Write the negations of the following assertions.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Lecture Notes on Metric Spaces

Lecture Notes on Metric Spaces Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Math 104: Homework 7 solutions

Math 104: Homework 7 solutions Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for

More information

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method CSC2411 - Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method Notes taken by Stefan Mathe April 28, 2007 Summary: Throughout the course, we have seen the importance

More information

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION O. SAVIN. Introduction In this paper we study the geometry of the sections for solutions to the Monge- Ampere equation det D 2 u = f, u

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Envelope Theorems for Arbitrary Parametrized Choice Sets

Envelope Theorems for Arbitrary Parametrized Choice Sets Envelope Theorems for Arbitrary Parametrized Choice Sets Antoine LOEPER 1 and Paul Milgrom January 2009 (PRELIMINARY) 1 Managerial Economics and Decision Sciences, Kellogg School of Management, Northwestern

More information

Going from graphic solutions to algebraic

Going from graphic solutions to algebraic Going from graphic solutions to algebraic 2 variables: Graph constraints Identify corner points of feasible area Find which corner point has best objective value More variables: Think about constraints

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information