Maximum likelihood in log-linear models
|
|
- Brianna Brooks
- 6 years ago
- Views:
Transcription
1 Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010
2 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets of V. A density f (or function) factorizes w.r.t. A if there exist functions ψ a (x) which depend on x a only and f (x) = a A ψ a (x). Similar to factorization w.r.t. graph, but A are not necessarily complete subsets of a graph. The set of distributions P A which factorize w.r.t. A is the hierarchical log linear model generated by A. To avoid redundancy, it is common to assume the sets in A to be incomparable in the sense that no subset in A is contained in any other member of A. A is the generating class of the log linear model.
3 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs For any generating class A we construct the dependence graph G(A) = G(P A ) of the log linear model P A. Since the pairwise Markov property has to hold for all members of P A, it has at least to hold for all positive members. The dependence graph is determined by the relation α β a A : α, β a. For sets in A are clearly complete in G(A) and therefore distributions in P A do factorize according to G(A). On the other hand, any graph with fewer edges would not suffice. They are thus also global, local, and pairwise Markov w.r.t. G(A).
4 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs As a generating class defines a dependence graph G(A), the reverse is also true. The set C(G) of cliques (maximal complete subsets) of G is a generating class for the log linear model of distributions which factorize w.r.t. G. If the dependence graph completely summarizes the restrictions imposed by A, i.e. if A = C(G(A)), A is conformal.
5 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs φ IJ φ IK I K J The factor graph of A is the bipartite graph with vertices V A and edges define by φ JK α a α a. Using this graph even non-conformal log linear models admit a simple visual representation.
6 Data in list form Log linear models Consider a sample X 1 = x 1,..., X n = x n from a distribution with probability mass function p. We refer to such data as being in list form, e.g. as case Admitted Sex 1 Yes Male 2 Yes Female 3 No Male 4 Yes Male...
7 Contingency Table Log linear models Data often presented in the form of a contingency table or cross-classification, obtained from the list by sorting according to category: Sex Admitted Male Female Yes No The numerical entries are cell counts n(x) = {ν : x ν = x} and the total number of observations is n = x X n (x).
8 Assume now p P A but otherwise unknown. The likelihood function can be expressed as L(p) = n ν=1 p(x ν ) = x X p(x) n(x). In contingency table form the data follow a multinomial distribution n! P{N(x) = n(x), x X } = x X n(x)! x X p(x) n(x) but this only affects the likelihood function by a constant factor.
9 The likelihood function L(p) = x X p(x) n(x), is continuous as a function of the ( X -dimensional vector) unknown probability distribution p. Since the closure P A is compact (bounded and closed), L attains its maximum on P A. Unfortunately, P A is not closed by itself so limits of factorizing distributions do not necessarily factorize. The maximum of the likelihood function may not necessarily on P A itself, so it is necessary in general to include the boundary points.
10 Indeed, it is also true that L has a unique maximum over P A, which we shall now show. For simplicity, we only establish uniqueness within P A. The proof is indirect, but quite simple. Assume p 1, p 2 P A with p 1 p 2 and L(p 1 ) = L(p 2 ) = sup p P A L(p). (1) Define p 12 (x) = c p 1 (x)p 2 (x), where c 1 = { x p1 (x)p 2 (x)} is a normalizing constant.
11 Then p 12 P A because p 12 (x) = c p 1 (x)p 2 (x) = c ψa(x)ψ 1 a(x) 2 = ψa 12 (x), a A a A where e.g. ψa 12 = c 1/ A ψa(x)ψ 1 a(x). 2 The Cauchy Schwarz inequality yields c 1 = x i.e. we have c > 1. p1 (x)p 2 (x) < x p 1 (x) p 2 (x) = 1 x
12 Hence L(p 12 ) = x = x p 12 (x) n(x) { c{ } n(x) p 1 (x)p 2 (x) = c n x p1 (x) n(x) p2 (x) n(x) = c n L(p 1 )L(p 2 ) > L(p 1 )L(p 2 ) = L(p 1 ) = L(p 2 ), which contradicts (1). Hence we conclude p 1 = p 2. The extension to P A is almost identical. It just needs a limit argument to establish p 1, p 2 P A p 12 P A. x
13 The maximum likelihood estimate ˆp of p is the unique element of P A which satisfies the system of equations nˆp(x a ) = n(x a ), a A, x a X a. (2) Here g(x a ) = y:y a=x a g(y) is the a-marginal of the function g. The system of equations (2) expresses the fitting of the marginals in A. It can be seen as an instance of the fact that in an exponential family (log-linear exponential), the MLE is found by equating the sufficient statistics (marginal counts) to their expectation.
14 Proof: Assume p P A is a solution to the equations (2). That p maximizes the likelihood function follows from the calculation below, where p P A is arbitrary and φ a = log ψ a : log L(p) = n(x) log p(x) = n(x) φ a (x) x X x X a A = n(x)φ a (x) a A x X = n(y)φ a (y) a A x a X a y:y a=x a = n(x a )φ a (x), a A x a X a as n(x a ) = y:y a=x a n(y).
15 Further we get log L(p) = n(x a )φ a (x) a A x a X a = np (x a )φ a (x) a A x a X a = np (x)φ a (x) a A x X = np (x) log p(x). x X Thus, for any p P A we have established that log L(p) = x X np (x) log p(x).
16 This is in particular also true for p. The information inequality now yields log L(p) = x X np (x) log p(x) x X np (x) log p (x) = log L(p ). The case of p P A needs an additional limit argument.
17 To show that the equations (2) indeed have a solution, we simply describe a convergent algorithm which solves it. This cycles (repeatedly) through all the a-marginals in A and fit them one by one. For a A define the following scaling operation on p: (T a p)(x) p(x) n(x a) np(x a ), x X where 0/0 = 0 and b/0 is undefined if b 0.
18 Fitting the marginals Log linear models The operation T a fits the a-marginal if p(x a ) > 0 when n(x a ) > 0: n(t a p)(x a ) = n p(y) n(y a) np(y y:y a ) a=x a = n n(x a) p(y) np(x a ) y:y a=x a = n n(x a) np(x a ) p(x a) = n(x a ). Consequently, we have T 2 a = T a. No reason to do it twice.
19 Make an ordering of the generators A = {a 1,..., a k }. Define S by a full cycle of scalings Define the iteration It then holds that Sp = T ak T a2 T a1. p 0 (x) 1/ X, p n = Sp n 1, n = 1,.... lim p n = ˆp n where ˆp is the unique maximum likelihood estimate of p P A, i.e. the solution of the equation system (2).
20 Known as the IPS-algorithm or IPF-algorithm, or as a variety of other names. Implemented e.g. (inefficiently) in R in loglin with front end loglm in MASS. Key elements in proof: 1. If p P A, so is T a p; 2. T a is continuous at any point p of P A with p(x a ) 0 whenever n(x a ) = 0; 3. L(T a p) L(p) so likelihood always increases; 4. ˆp is the unique fixpoint for T (and S); 5. P A is compact.
21 A simple example Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal Admissions data from Berkeley. Consider A S, corresponding to A = {{A}, {S}}. We should fit A-marginal and S-marginal iteratively.
22 Initial values Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal Entries all equal to 4526/4. Gives initial values of np 0.
23 Fitting S-marginal Log linear models Admitted Sex Yes No S-marginal Male Female A-marginal For example and so on =
24 Fitting A-marginal Log linear models For example Admitted Sex Yes No S-marginal Male Female A-marginal = and so on. Algorithm has converged, as both marginals now fit!
25 Normalised to probabilities Admitted Sex Yes No S-marginal Male Female A-marginal Dividing everything by 4526 yields ˆp. It is overkill to use the IPS algorithm as there is an explicit formula, as we shall see next time.
26 L attains its maximum on P A.
27 L attains its maximum on P A. The maximizer ˆp P A of L is unique.
28 L attains its maximum on P A. The maximizer ˆp P A of L is unique. The maximizer ˆp is determined as the unique solution within P A of the equations nˆp(x a ) = n(x a ), a A, x a X a.
29 L attains its maximum on P A. The maximizer ˆp P A of L is unique. The maximizer ˆp is determined as the unique solution within P A of the equations nˆp(x a ) = n(x a ), a A, x a X a. The maximizer is the limit of the convergent repeated fitting of marginals T a : p(x) p(x)n(x a )/{np(x a )}, x X, a A.
Decomposable Graphical Gaussian Models
CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationMarkov properties for undirected graphs
Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,
More informationMath 152. Rumbos Fall Solutions to Assignment #12
Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus
More informationLikelihood Analysis of Gaussian Graphical Models
Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationMarkov properties for undirected graphs
Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationEE512 Graphical Models Fall 2009
EE512 Graphical Models Fall 2009 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2009 http://ssli.ee.washington.edu/~bilmes/ee512fa09 Lecture 3 -
More informationAN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES
AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim
More informationProbability Propagation
Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;
More informationMath 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1
Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationEstimates for probabilities of independent events and infinite series
Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences
More informationSolutions to Problem Set 5
UC Berkeley, CS 74: Combinatorics and Discrete Probability (Fall 00 Solutions to Problem Set (MU 60 A family of subsets F of {,,, n} is called an antichain if there is no pair of sets A and B in F satisfying
More informationP i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=
2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationDefinition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.
Chapter 6 Completeness Lecture 18 Recall from Definition 2.22 that a Cauchy sequence in (X, d) is a sequence whose terms get closer and closer together, without any limit being specified. In the Euclidean
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationDecomposable and Directed Graphical Gaussian Models
Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationM17 MAT25-21 HOMEWORK 6
M17 MAT25-21 HOMEWORK 6 DUE 10:00AM WEDNESDAY SEPTEMBER 13TH 1. To Hand In Double Series. The exercises in this section will guide you to complete the proof of the following theorem: Theorem 1: Absolute
More informationu xx + u yy = 0. (5.1)
Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function
More informationModule 7 : Applications of Integration - I. Lecture 20 : Definition of the power function and logarithmic function with positive base [Section 20.
Module 7 : Applications of Integration - I Lecture 20 : Definition of the power function and logarithmic function with positive base [Section 201] Objectives In this section you will learn the following
More informationCharacterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets
Convergence in Metric Spaces Functional Analysis Lecture 3: Convergence and Continuity in Metric Spaces Bengt Ove Turesson September 4, 2016 Suppose that (X, d) is a metric space. A sequence (x n ) X is
More informationi c Robert C. Gunning
c Robert C. Gunning i ii MATHEMATICS 218: NOTES Robert C. Gunning January 27, 2010 ii Introduction These are notes of honors courses on calculus of several variables given at Princeton University during
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationFunctional Analysis HW #3
Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More informationRIEMANN MAPPING THEOREM
RIEMANN MAPPING THEOREM VED V. DATAR Recall that two domains are called conformally equivalent if there exists a holomorphic bijection from one to the other. This automatically implies that there is an
More informationLECTURE Itineraries We start with a simple example of a dynamical system obtained by iterating the quadratic polynomial
LECTURE. Itineraries We start with a simple example of a dynamical system obtained by iterating the quadratic polynomial f λ : R R x λx( x), where λ [, 4). Starting with the critical point x 0 := /2, we
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationData Mining 2018 Bayesian Networks (1)
Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10
More informationHomework I, Solutions
Homework I, Solutions I: (15 points) Exercise on lower semi-continuity: Let X be a normed space and f : X R be a function. We say that f is lower semi - continuous at x 0 if for every ε > 0 there exists
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More information3.8 Strong valid inequalities
3.8 Strong valid inequalities By studying the problem structure, we can derive strong valid inequalities which lead to better approximations of the ideal formulation conv(x ) and hence to tighter bounds.
More informationRecitation 9: Loopy BP
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,
More informationChapter 7. Extremal Problems. 7.1 Extrema and Local Extrema
Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationProbability Propagation
Graphical Models, Lecture 12, Michaelmas Term 2010 November 19, 2010 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable; (iii)
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationIntroduction to Semidefinite Programming I: Basic properties a
Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite
More informationMath 61CM - Solutions to homework 6
Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric
More informationStatistical Process Control for Multivariate Categorical Processes
Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationCutting planes from extended LP formulations
Cutting planes from extended LP formulations Merve Bodur University of Wisconsin-Madison mbodur@wisc.edu Sanjeeb Dash IBM Research sanjeebd@us.ibm.com March 7, 2016 Oktay Günlük IBM Research gunluk@us.ibm.com
More informationConvex relaxation. In example below, we have N = 6, and the cut we are considering
Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution
More informationMidterm 1. Every element of the set of functions is continuous
Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions
More informationM311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.
M311 Functions of Several Variables 2006 CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3. Differentiability 1 2 CHAPTER 1. Continuity If (a, b) R 2 then we write
More informationIntroduction to the Tensor Train Decomposition and Its Applications in Machine Learning
Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March
More informationMultivariable Calculus
2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)
More informationAccumulation constants of iterated function systems with Bloch target domains
Accumulation constants of iterated function systems with Bloch target domains September 29, 2005 1 Introduction Linda Keen and Nikola Lakic 1 Suppose that we are given a random sequence of holomorphic
More information3 Undirected Graphical Models
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationLecture 17: May 29, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador
More informationSolution of the 8 th Homework
Solution of the 8 th Homework Sangchul Lee December 8, 2014 1 Preinary 1.1 A simple remark on continuity The following is a very simple and trivial observation. But still this saves a lot of words in actual
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationconverges as well if x < 1. 1 x n x n 1 1 = 2 a nx n
Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationPreliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic
Introduction to EF-games Inexpressivity results for first-order logic Normal forms for first-order logic Algorithms and complexity for specific classes of structures General complexity bounds Preliminaries
More information1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)
CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) Final Review Session 03/20/17 1. Let G = (V, E) be an unweighted, undirected graph. Let λ 1 be the maximum eigenvalue
More informationIn particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with
Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient
More informationLaplace s Equation. Chapter Mean Value Formulas
Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More information17 Variational Inference
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact
More informationCSE 312 Final Review: Section AA
CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material
More informationFirst In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018
First In-Class Exam Solutions Math 40, Professor David Levermore Monday, October 208. [0] Let {b k } k N be a sequence in R and let A be a subset of R. Write the negations of the following assertions.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationLecture Notes on Metric Spaces
Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],
More informationYour first day at work MATH 806 (Fall 2015)
Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies
More informationExpectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,
More informationMath 104: Homework 7 solutions
Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for
More informationCSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method Notes taken by Stefan Mathe April 28, 2007 Summary: Throughout the course, we have seen the importance
More informationA LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION
A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION O. SAVIN. Introduction In this paper we study the geometry of the sections for solutions to the Monge- Ampere equation det D 2 u = f, u
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationEnvelope Theorems for Arbitrary Parametrized Choice Sets
Envelope Theorems for Arbitrary Parametrized Choice Sets Antoine LOEPER 1 and Paul Milgrom January 2009 (PRELIMINARY) 1 Managerial Economics and Decision Sciences, Kellogg School of Management, Northwestern
More informationGoing from graphic solutions to algebraic
Going from graphic solutions to algebraic 2 variables: Graph constraints Identify corner points of feasible area Find which corner point has best objective value More variables: Think about constraints
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationMeasure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond
Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................
More informationOn John type ellipsoids
On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to
More information