Joint Gaussian Graphical Model Review Series I

Size: px
Start display at page:

Download "Joint Gaussian Graphical Model Review Series I"

Transcription

1 Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia June 23rd, 2017 Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

2 Outline 1 Notation 2 Probability 3 Dependence and Correlation 4 Conditional Dependence and Partial Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

3 Notation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

4 Notation P The probability measure. Ω The sample space. F The event set. X, Y, Z The random variables. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

5 Probability Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

6 Probability Space Probability Space Let (Ω, F, P) be the probability space. Ω be an arbitrary non-empty set. F 2 Ω is a set of events. P is the probability measure. In another word, a function : F [0, 1]. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

7 Events F contains Ω. F is closed under complements. F is closed under countable unions. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

8 Probability Measure Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

9 Random Variable Random Variable Let X : Ω R be a random variable. X is a measurable function. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

10 Random Variable Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

11 Probability Distribution Probability Distribution function Let F (x) : R [0, 1] = P[X < x] where x R. X = Y, they follow same distribution? F X = F Y, then X = Y? Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

12 Joint Probability Joint Probability The probability distribution of random vector (X, Y ). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

13 Joint Probability Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

14 Marginal Probability Marginal Probability A pair of random variable (X, Y ), the probability distribution of X. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

15 Joint Probability Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

16 Conditional Distribution Conditional Distribution Given the information of Y, the probability distribution of X. Notation X Y. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

17 Joint Probability Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

18 Relationship Relationship P(X = x, Y = y) = P(Y = y)p(x = x Y = y) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

19 Dependence and Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

20 Independence Independence X and Y are independent if and only if p X,Y (x, y) = p X (x)p Y (y), where p is the probability density function. Independence Y X = Y Filp coin example Causal relationship Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

21 Correlation Covariance Cov(X, Y ) = E[(X µ X )(Y µ Y )], where µ X, µ Y is the mean vector. Correlation ρ(x, Y ) = Cov(X,Y ) σ X σ Y Linear relationship Linear dependency between X and Y. ρ(x, Y ) = 1 means that X and Y are in the same linear direction while ρ(x, Y ) = 1 means that X and Y are in the reverse linear direction. ρ(x, Y ) = 1 means that when X increase, Y increase with all the points lying on the same line. ρ(x, Y ) = 0 means that X and Y are perpendicular with each other. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

22 Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

23 Dependence and Correlation Correlation is easy to estimate the value while independence is a relationship to infer. Dependence is stronger relationship than correlation. In another word, if X and Y are independent, ρ(x, Y ) = 0. However, the reverse doesn t hold. For example, suppose the random variable X is symmetrically distributed about zero and Y = X 2. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

24 Gaussian Example The distribution of bivariate Gaussian is: ( ( 1 f (x, y) = 2πσ X σ exp 1 (x Y 1 ρ 2 2(1 ρ 2 ) µx ) 2 σ 2 X + (y µ Y ) 2 σ 2 Y (3.1) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

25 Gaussian Example Suppose (X, Y ) are uncorrelated. i.e.,(x, Y ) N(0, diag(σ 2 X, σ2 Y )). f (x, y) = = 1 exp( 1 2πσ X σ Y 2 ((x µ X ) 2 σx 2 (x µ X ) 2 1 2πσX exp( 1 2 = f (x)f (y) σ 2 X + (y µ Y ) 2 σy 2 )) 1 ) exp( 1 (y µ Y ) 2 ) 2πσY 2 σ 2 Y (3.2) Therefore, if (X, Y ) follows bivariate Gaussian, (X, Y ) are uncorrelated if and only if (X, Y ) are independent. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

26 Summary Correlation is easy to estimate the value while independence is a relationship to infer. In the Gaussian Case, they are equivalent. From the structure learning angle, dependence is about the causal relationship, while correlation is, more specifically, the linear relationship. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

27 Conditional Dependence and Partial Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

28 Conditional Dependence Let s consider a more complicated case. There is another third random variable Z. There are two ways to view the conditional dependence. X and Y are independent conditional on Z X Z and Y Z are independent Conditional Dependence X and Y are independent on Z if and only if p X,Y Z (x, y) = p X Z (x)p Y Z (y), where p is the probability density function. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

29 Partial Correlation Partial Correlation Formally, the partial correlation between X and Y given random variable Z, written ρ XY Z, is the correlation between the residuals R X and R Y resulting from the linear regression of X with Z and of Y with Z, respectively. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

30 Partial Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

31 Partial Correlation Partial Correlation Calculation Suppose P = Σ 1 (Σ is covariance matrix or Correlation matrix) ρ Xi X j V\{X i,x j } = p ij pii p jj. The value is exactly related to the precision matrix (the inverse of covariance matrix)! Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

32 Conditional Dependence and Partial Correlation Similarly, in the Gaussian Case, they are equivalent. A detailed derivation is in the next talk. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

33 Gaussian Case Partial Correlation is easy to estimate the value while conditional independence is a relationship to infer. Conditional Dependence is stronger relationship than partial correlation. In another word, if X Z and Y Z are independent, ρ(x, Y Z) = 0. However, the reverse doesn t hold. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

34 Summary Partial correlation is easy to estimate the value while conditional independence is a relationship to infer. In the Gaussian Case, they are equivalent. From the structure learning angle, conditional dependence is about the causal relationship, while partial correlation is, more specifically, the linear relationship. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series I June 23rd, / 34

35 Joint Gaussian Graphical Model Review Series II Gaussian Graphical Model Basics Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia June 30th, 2017 Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

36 Outline 1 Notation 2 Reviews 3 Why partial correlation and condition dependence are equivalent in the Gaussian case? 4 Maximum Likelihood Method 5 Regression Method Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

37 Notation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

38 Notation Σ The covariance matrix. Ω The precision matrix. µ The mean vector. x i The i-th sample follows multivariate normal distribution. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

39 Reviews Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

40 Reviews Probability basics Dependency vs. Correlation Conditional dependency vs. partial Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

41 Summary from last talk Partial correlation is easy to estimate the value while conditional independence is a relationship to infer. In the Gaussian Case, they are equivalent. From the structure learning angle, conditional dependence is about the causal relationship, while partial correlation is, more specifically, the linear relationship. So the remaining question is why in the Gaussian case they are equivalent and how to infer this relationship. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

42 Review: Gaussian Example Suppose (X, Y ) are uncorrelated. i.e.,(x, Y ) N(0, diag(σ 2 X, σ2 Y )). f (x, y) = = 1 exp( 1 2πσ X σ Y 2 ((x µ X ) 2 σx 2 (x µ X ) 2 1 2πσX exp( 1 2 = f (x)f (y) σ 2 X + (y µ Y ) 2 σy 2 )) 1 ) exp( 1 (y µ Y ) 2 ) 2πσY 2 σ 2 Y (2.1) Therefore, if (X, Y ) follows bivariate Gaussian, (X, Y ) are uncorrelated if and only if (X, Y ) are independent. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

43 Why partial correlation and condition dependence are equivalent in the Gaussian case? Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

44 Multivariate Gaussian Distribution Density function Let X N(µ, Σ). f (x) = (2π) p 2 det(σ) 1 2 exp( 1 2 (x µ)t Σ 1 (x µ)) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

45 Partition X, µ, and Σ Partition X, µ, Σ, Ω. µ = [ µ1 µ 2 [ ] Σ11 Σ Σ = 12 Σ 21 Σ 22 [ ] Ω = Σ 1 Ω11 Ω = 12 Ω 21 Ω 22 [ ] X1 X = X 2 ] Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

46 Conditional Distribution of Multivariate Gaussian If X N(µ, Σ), it holds that X 2 N(µ 2, Σ 22 ). If Σ 22 is regular, it further holds that X 1 (X 2 = a) N(µ 1 2, Σ 1 2 ) where µ 1 2 = µ 1 + Σ 12 Σ 1 22 (a µ 2), and Σ 1 2 = Σ 11 Σ 12 Σ 1 22 Σ 21 = (Ω 11 ) 1. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

47 Partial correlation and condition dependence are equivalent in the Gaussian case X 1 X 2 = a N(µ 1 2, (Ω 11 ) 1 ), If X 1 only contains x i and x j, then x i and x j are conditional independent on others iff Ω ij = 0. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

48 Estimate the condition dependence graph/partial correlation Now the only thing left is to estimate Ω = Σ 1. There are three potential ways to do that. We call this problem as Gaussian Graphical model. Directly calculate the inverse of the sample covariance matrix Σ. However, we cannot do that when the sample covariance matrix is not invertible. Maximum Likelihood Method Regression method For the first one, the sample covariance matrix Σ may not be invertible. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

49 Maximum Likelihood Method Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

50 The MLE of µ L(µ, Ω) = (2π) np 2 n i=1 det(ω 1 ) 1 2 exp ( 1 2 (x i µ) T Ω(x i µ) ). After take a first derivative, it is easy to show that x = x 1+ +x n n Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

51 The Likelihood of Ω L( x, Ω) = (2π) np 2 n i=1 det(ω 1 ) 1 2 exp ( 1 2 (x i x) T Ω(x i x) ). Notice that (x i x) T Ω(x i x) is a scalar. Therefore, (x i x) T Ω(x i x) = trace((x i x) T Ω(x i x)). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

52 The Likelihood of Ω Since tr(a, B) = tr(b, A). ( L( x, Ω) det(ω 1 ) n 2 exp 1 n 2 i=1 ( = det(ω 1 ) n 2 exp 1 n 2 i=1 ( ( n )) = det(ω 1 ) n 2 exp 1 2 tr (x i x) (x i x) T Ω i=1 ( = det(ω 1 ) n 2 exp 1 ) 2 tr (SΩ) where, S = n (x i x)(x i x) T R p p. i=1 ( tr (x i x) T Ω (x i x)) ) (4.1) ( tr (x i x) (x i x) Ω) ) T (4.2) (4.3) (4.4) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

53 The Log-Likelihood of Ω ( ln L( x, Ω) = const n 2 ln det(ω 1 ) 1 2 tr Ω n ) ( x µ)( x µ) T. Since det(a 1 ) = 1/ det(a), ( ) ln L( x, Ω) ln det(ω) tr Ω 1 n ( x µ)( x µ) T n i=1 ( ) = ln det(ω) tr ΩŜ i=1 (4.5) (4.6) where Ŝ is the sample covariance matrix. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

54 Regression Method Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

55 Partial Correlation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

56 Partial correlation As we know, the partial correlation can also be solved by the linear regression. In the Gaussian case, we can use so-called neighborhood approach. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

57 Conditional Distribution of Multivariate Gaussian If X N(µ, Σ), it holds that X 2 N(µ 2, Σ 22 ). If Σ 22 is regular, it further holds that X 1 X 2 = a N(µ 1 2, Σ 1 2 ) where µ 1 2 = µ 1 + Σ 12 Σ 1 22 (a µ 2), and Σ 1 2 = Σ 11 Σ 12 Σ 1 22 Σ 21 = (Ω 11 ) 1. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

58 Neighborhood approach If X N(0, Σ) and let X 1 = X j. X j X \j N(Σ \j,j Σ 1 \j,\j X \j, Σ jj Σ \j,j Σ 1 \j,\j Σ \j,j) Let α j := Σ \j,j Σ 1 \j,\j and σj 2 := Σ jj Σ \j,j Σ 1 \j,\j Σ \j,j. We have that where ɛ j N(0, σ 2 j ) is independent of X \j. X j = α T j X \j + ɛ j (5.1) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

59 Neighborhood approach We can estimate the α j by solving p simple linear regression. if i-th entry of α j equals to 0, it means that X i and X j are partial uncorrelated and conditional independent. Perhaps we want more assumption on α j like sparsity. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

60 Summary In Gaussian case, the partial correlation and the conditional dependence are equivalent We have two ways to estimate them. First, directly estimate the precision matrix by MLE. Second, solve p linear regression problem by neighborhood approach. None of them have any assumptions on the partial correlation coefficient. In the next talk, let s introduce the solutions of these two estimators. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series II June 30th, / 26

61 Joint Gaussian Graphical Model Review Series III Markov Random Field and Log Linear Model Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia July 7th, 2017 Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

62 Outline 1 Why we need Graphical Model? 2 Graphical Model 3 Markov Random Field Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

63 Road Map Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

64 Road Map Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

65 Review: Gaussian Case In the Gaussian case, we know the conditional dependence and partial correlation are equivalent. This pairwise relationship can be naturally represented by a graph G = (V, E). Ω > 0 is a natural adjacency matrix. We call the pairwise conditional dependence relationship among variables as undirected Graphical Model. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

66 Why we need Graphical Model? Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

67 A Toy Example Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

68 A Toy Example Suppose X = (X 1, X 2, X 3, X 4, X 5, X 6 ). Each variable only takes either 0 or 1. To estimate the joint probability p(x ), you need to estimate 2 6 values. However, if we know the conditional independence graph, p(x ) = p(x 1, X 2, X 3 )p(x 4, X 5, X 6 ). You only need to estimate 2 4 values. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

69 Proof of the decomposition First, let s prove that if X 1 X 3 X 2, then p(x 1 X 3, X 2 ) = p(x 1 X 2 ). p(x 1 X 2 )p(x 3 X 2 ) = p(x 1, X 3 X 2 ) = p(x 1 X 3, X 2 )p(x 3 X 2 ). Cancel out p(x 3 X 2 ) in the both sides, we can have the conclusion. It is easy to obtain the similar result under the local markov property: p(x v X v\n(v), X N(v) ) = p(x v X N(v) ). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

70 Proof of the decomposition p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 X 2, X 3, X 4, X 5, X 6 )p(x 2 X 3, X 4, X 5, X 6 )p(x 3 X By the conclusion we have in the last page, the left equals to p(x 1 X 2, X 3 )p(x 2 X 3 )p(x 3 )p(x 4, X 5, X 6 ) (1.1) =p(x 1, X 2, X 3 )p(x 4, X 5, X 6 ) (1.2) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

71 Graphical Model Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

72 Graphical Model Probability Inference: estimate joint probability, marginal probability, and conditional probability. Structure learning: Give dataset X, learn the Graph structure from X (i.e., learn the edge patterns between variables). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

73 A Toy Example Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

74 Probability Inference: Calculate the joint Probability You know that p(x ) = p(x 1, X 2, X 3 )p(x 4, X 5, X 6 ). Traditionally, p(x 1, X 2 = a) = p(x 1, X 2 = a, X 3, X 4, X 5, X 6 ). X 3,X 4,X 5,X 6 16 operators. By the graph, we can have p(x 1, X 2 = a) = p(x 1, X 2 = a, X 3 ) p(x 4, X 5, X 6 ). X 3 X 4,X 5,X 6 10 operators. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

75 Probability Inference: Calculate the joint Probability You know that p(x ) = p(x 1, X 2, X 3 )p(x 4, X 5, X 6 ). Traditionally, p(x 1, X 2 = a) = p(x 1, X 2 = a, X 3, X 4, X 5, X 6 ). X 3,X 4,X 5,X 6 16 operators. By the graph, we can have p(x 1, X 2 = a) = p(x 1, X 2 = a, X 3 ) p(x 4, X 5, X 6 ). X 3 X 4,X 5,X 6 10 operators. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

76 Markov Random Field Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

77 Markov Random Field Markov Random Field Given an undirected graph G = (V, E), a set of random variables X = (X v ) v V indexed by V form a Markov random field with respect to G if they satisfy the local Markov property: A variable is conditionally independent of all other variables given its neighbors: X v X V \N(v) X N(v) This property is stronger than the pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables: X u X v X V \{u,v} if {u, v} / E. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

78 A Toy Example Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

79 Clique factorization If this joint density can be factorized over the cliques of G: p(x = x) = φ C (x C ) C cl(g) then X forms a Markov random field with respect to G. Here, cl(g) is the set of cliques of G. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

80 A Toy Example Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

81 Log-linear Model Any Markov random field can be written as log-linear model with feature functions f k such that the full-joint distribution can be written as: ( ) P(X = x) = 1 Z exp wk f k(x ) k. Notice that the reverse doesn t hold. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

82 Example I: Pairwise Model Pairwise Model P(X = x) = 1 Z(Θ) exp θs xs 2 + s V. Examples: Gaussian Graphical Model Ising Model (s,t) E θstx s x t These two models have good estimators to infer the MRF. Generally, estimate Θ is difficult. Since it involves computing Z(Θ) or its derivatives. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

83 Example I: Pairwise Model Gaussian Case Gaussian Case. f (x 1,..., x k ) = exp ( 1 2 (x µ)t Σ 1 (x µ) ) (2π) k Σ Solution: ( ) ln L( x, Ω) ln det(ω) tr Ω 1 n ( x µ)( x µ) T n i=1 ( ) = ln det(ω) tr ΩŜ (3.1) (3.2) where Ŝ is the sample covariance matrix. For the Ising model, we use generalized covariance matrix to avoid the normalization term. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

84 Example II: Non-pairwise model Nonparanormal Graphical Model Are there any non-pairwise model which is easy to estimate? Nonparanormal Graphical Model P(X = x) = 1 ( Z exp 1 ) 2 (f (x) µ)t Σ 1 (f (x) µ). where f (X ) = (f 1 (X 1 ), f 2 (X 2 ),... f p (X p )) and each f i is a univariate monotone function. f (X ) N(µ, Σ). eilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

85 Summary The formal definition of Markov Random Field (undirected Graphical Model) General formulation: Clique factorization log-linear Model Two examples: pairwise model and nonparanormal Graphical Model. In the next talk, let s introduce the solutions of these two estimators for sggm. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series III July 7th, / 25

86 Joint Gaussian Graphical Model Review Series IV A Unified Framework for M-estimaotr and Elementary Estimators Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia July 21st, 2017 Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

87 Road Map Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

88 Outline 1 Notation 2 Review 3 Regularized M-estimator 4 A unified framework 5 Elementary Estimator Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

89 Notation Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

90 Notation L The loss function. R The Regularization function (norm). R The Dual norm of R. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

91 Review Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

92 Review from last talk Likelihood of the precision matrix in the Gaussian case Graphical Model Basics Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

93 Regularized M-estimator Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

94 Example Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

95 Regularized M-estimator M-estimator In statistics, M-estimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. The parameters are estimated by argmin the sums of functions of the data. target L(X, θ) the loss function Conditions R(θ) the Regularization function Therefore, the whole objective function is: argmin L(X, θ) + λ n R(θ) (3.1) θ Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

96 Example: Linear Model Let s use the linear regression model as an example. Target Find β, such that X β = y. Constrains: Sparsity Prediction Accuracy: Sacrifice a little bias and reduce the variance. Improve the overall performance. Interpretation: With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effect. argmin y X β 2 (3.2) β Subject to: β 0 t (3.3) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

97 Example: Lasso Since l 0 -norm is not a convex function, we need the closest convex function of l 0 -norm. argmin y X β 2 (3.4) β Subject to: β 1 t (3.5) Lasso argmin y X β 2 + λ n β 1 β Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

98 Other equivalent formulation argmin β 1 (3.6) β Subject to: y = X β (3.7) Dantzig selector argmin β 1 (3.8) β Subject to: X T (X β y) λ n (3.9) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

99 A unified framework Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

100 Three major Criteria Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

101 Three major Criteria Statistical Convergence Rate: How close is between your estimated parameter and the true parameter. It corresponds to estimation error and approximation error. Computational Complexity: How fast the algorithm is with respect to certain parameters, e.g., n and p. Optimization Rate of Convergence: How fast each optimization step move to the estimated parameter, such as linear or quadratic. Traditional statisticians focus on the statistical convergence rate (Accuracy). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

102 High dimension vs low dimension low dimension: when n is large, the error is asymptotic 0 by the law of large number. high dimension (i.e.,p/n c 0): the error is not asymptotic 0. High dimensional analysis is relative hard. Traditionally, we need carefully proof for every estimator. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

103 Three major Criteria Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

104 A unified framework for M-estimator [Negahban et al.(2009)negahban, Yu, Wainwright, and Ravik Decomposability of R Suppose a subspace M R p, a norm-based regularizer R is decomposable with respect to (M, M ) if R(θ + γ) = R(θ) + R(γ) for all θ M and γ M, where M := {v R p < u, v >= 0 u M}. Subspace compatibility constant Φ(M) := with respect to the pair (R, ). R(u) sup u M\{0} u Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

105 A unified framework for M-estimator [Negahban et al.(2009)negahban, Yu, Wainwright, and Ravik Example: l 1 l 1 is decomposable and the Φ(M) = s with respect to (l 1, l 2 ). Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

106 A unified framework for M-estimator Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

107 Example: Lasso θ λn θ 2 2 O( s log p ) n In high dimensional setting, the sparsity assumption actually improves the convergence rate a lot. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

108 Elementary Estimator Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

109 We have a very powerful tool to easily prove the convergence rate. We can also follow the similar process to prove the convergence rate for estimators like Dantzig Selector. However, a lot of statistical method is slow when p and n are large and they are not scalable at all. Are there any estimators with close form solution for the statistic problem, which also achieve the optimal convergence rate? Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

110 Three major Criteria Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

111 Three major Criteria Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

112 Elementary Estimator[Yang et al.(2014b)yang, Lozano, and Ravikumar] Here B ( φ) is a backward mapping for φ. argminr(θ) (5.1) θ Subject to: R (θ B ( φ)) λ n (5.2) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

113 Example: sparse linear regression[yang et al.(2014a)yang, Lozano, and Ravikumar] argmin θ 1 (5.3) θ Subject to: θ (X T X + ɛi ) 1 X T y λ n (5.4) Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

114 Summary We review the unified framework for M-estimator, which can be applied to most regularized M-estimator problem Following the similar proof strategy, we have the set of elementary estimators. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

115 References I S. Negahban, B. Yu, M. J. Wainwright, and P. K. Ravikumar. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, pages , E. Yang, A. Lozano, and P. Ravikumar. Elementary estimators for high-dimensional linear regression. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages , 2014a. E. Yang, A. C. Lozano, and P. K. Ravikumar. Elementary estimators for graphical models. In Advances in Neural Information Processing Systems, pages , 2014b. Beilun Wang, Advisor: Yanjun Qi (UniversityJoint of Virginia) Gaussian Graphical Model Review Series IV July 21st, / 30

116 Joint Gaussian Graphical Model Series V sparse Gaussian Graphical Model estimators Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia July 28th, 2017 Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

117 Road Map Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

118 Outline 1 Notation 2 Review 3 Neighborhood Method 4 Graphical Lasso 5 CLIME 6 Elementary Estimator for Gaussian Graphical Model Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

119 Notation Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

120 Notation Σ The covariance matrix. Ω The precision matrix. p The number of features. n The number of samples. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

121 Review Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

122 Review from last talk Regularized M-estimator argmin L(θ) + λ n R(θ) θ a unified framework to analyze the statistical convergence rate for high-dimensional statistics Elementary Estimator Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

123 Review of Gaussian Graphical Model Suppose the precision matrix Ω = Σ 1. ( ) The log-likelihood of Ω equals to ln det(ω) tr ΩŜ In this talk, we will use this likelihood to derive several estimators of sparse Gaussian Graphical Model (sggm) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

124 Neighborhood Method Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

125 Neighborhood approach If X N(0, Σ) and let X 1 = X j. X j X \j N(Σ \j,j Σ 1 \j,\j X \j, Σ jj Σ \j,j Σ 1 \j,\j Σ \j,j) Let α j := Σ \j,j Σ 1 \j,\j and σj 2 := Σ jj Σ \j,j Σ 1 \j,\j Σ \j,j. We have that where ɛ j N(0, σ 2 j ) is independent of X,\j. X j = α T j X,\j + ɛ j (3.1) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

126 Neighborhood approach with sparse assumption By the sparse assumption, we estimate each α j by a lasso estimator α j = argmin αj T X,\j X j λ α j 1 (3.2) α j Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

127 Review of Lasso solution Lasso subgradient method β = argmin β T X y λ β 1 (3.3) β g(β; λ) = 2X T (y X β) + λsgn(β) (3.4) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

128 Review of Lasso solution: State of the Art We see that the proximity operator is important because x is a minimizer to the problem min x H F (x) + R(x) if and only if x = prox γr (x γ F (x )), where γ > 0. γ is any positive real number. Proximal gradient method ( proxγr (x) ) i x i γ, x i > γ = 0, x i γ x i + γ, x i < γ, (3.5) By using the fixed point method, you can obtain the estimation of β. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

129 Graphical Lasso Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

130 Graphical Lasso[Friedman et al.(2008)friedman, Hastie, and Tibshirani We already have the log-likelihood as the loss function. Can we use it to obtain a similar estimator as Lasso? ( ) argmin ln det(ω) + tr Ω ΩŜ + λ n Ω 1 (4.1) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

131 Proximal gradient method to solve it Let s do a practice in the white board. Super Linear algorithm. lim k x k+1 x x k x = 0. eilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

132 State of the art method: Big & QUIC[Hsieh et al.(2011)hsieh, Sustik, Dhillon, and Ravikuma Parallelized Coordinate descent. approximated quadratic algorithm. lim k x k+1 x x k x 2 < M eilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

133 CLIME Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

134 CLIME[Cai et al.(2011)cai, Liu, and Luo] CLIME argmin Ω 1, subject to: ΣΩ I λ (5.1) Ω Here λ > 0 is the tuning parameter. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

135 By taking the first derivative of Eq. (4.1) and setting it equal to zero, the solution Ω glasso also satisfies: Ω 1 glasso Σ = λẑ (5.2) where Ẑ is an element of the subdifferential Ω glasso 1. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

136 Column-wise estimator argmin β 1 subject to Σβ e j λ CLIME can be estimated column-by-column. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

137 Elementary Estimator for Gaussian Graphical Model Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

138 Elementary Estimator Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

139 Elementary Estimator[Yang et al.(2014b)yang, Lozano, and Ravikumar] Here B ( φ) is a backward mapping for φ. argminr(θ) (6.1) θ Subject to: R (θ B ( φ)) λ n (6.2) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

140 Example: sparse linear regression[yang et al.(2014a)yang, Lozano, and Ravikumar] argmin θ 1 (6.3) θ Subject to: θ (X T X + ɛi ) 1 X T y λ n (6.4) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

141 Elementary Estimator for sggm argmin Ω 1,off Ω (6.5) subject to: Ω [T v (Σ)] 1,off λ n Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

142 Summary We review most sggm estimators. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

143 References I T. Cai, W. Liu, and X. Luo. A constrained l1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494): , J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): , C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. D. Ravikumar. Sparse inverse covariance matrix estimation using quadratic approximation. In NIPS, pages , Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

144 References II E. Yang, A. Lozano, and P. Ravikumar. Elementary estimators for high-dimensional linear regression. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages , 2014a. E. Yang, A. C. Lozano, and P. K. Ravikumar. Elementary estimators for graphical models. In Advances in Neural Information Processing Systems, pages , 2014b. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series V July 28th, / 29

145 Joint Gaussian Graphical Model Series VI Multi-task sggms and its optimization challenges Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia August 4th, 2017 Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

146 Road Map Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

147 Outline 1 Notation 2 Review 3 Multi-task Learning 4 Multi-task sggms 5 Optimization Challenge of Multi-task sggms 6 Joint Graphical Lasso Example Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

148 Notation Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

149 Notation X (i) The i-th data matrix Σ (i) The i-th covariance matrix. Ω (i) The i-th precision matrix. p The number of features. n i The number of samples in the i-th data matrix. K The number of tasks. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

150 Review Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

151 Review from last talk We introduce four estimators of sparse Gaussian Graphical Model. We finish most contents about sparse Gaussian Graphical Model in the last five talks. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

152 Review of Gaussian Graphical Model Suppose the precision matrix Ω = Σ 1. ( ) The log-likelihood of Ω equals to ln det(ω) tr ΩŜ Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

153 Multi-task Learning Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

154 Multi-task Learning Multi-task Learning Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

155 Multi-task Learning Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

156 Multi-task Learning Linear Classifier Example Linear Classifier f (x) = sgn(w T x + b) (3.1) Multi-task Linear Classifiers For the i-th task, f i (x) = sgn((w T S + w T i )x + b) (3.2) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

157 Multi-task sggms Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

158 Multi-task sggms Problem Input: {X (i) } Output: {Ω (i) } Assumption I: Sparsity Assumption II: Commonalities and Differences Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

159 Multi-task sggms Likelihood i ( n i (ln det(ω (i) ) tr Ω (i) Ŝ (i)) ) (4.1) Likelihood with sparsity assumption ( n i (ln det(ω (i) ) tr Ω (i) Ŝ (i)) ) (4.2) i Subject to: Ω (i) 1 t (4.3) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

160 Multi-task sggms Likelihood with multi-task setting ( n i (ln det(ω (i) ) tr Ω (i) Ŝ (i)) ) (4.4) i Subject to: Ω (i) 1 t (4.5) P(Ω (1), Ω (2),..., Ω (K) ) t 2 (4.6) Joint Graphical Lasso [Danaher et al.(2013)danaher, Wang, and Witten] i ( n i (ln det(ω (i) )+tr Ω (i) Ŝ (i)) )+λ 1 Ω (i) 1 +λ 2 P(Ω (1), Ω (2),..., Ω (K) ) (4.7) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

161 Optimization Challenge of Multi-task sggms Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

162 General formulation Likelihood with multi-task setting i ( n i (ln det(ω (i) ) + tr Ω (i) Ŝ (i)) ) (5.1) Subject to: Ω (i) 1 t (5.2) P(Ω (1), Ω (2),..., Ω (K) ) t 2 (5.3) General formulation f (x) + g(z) (5.4) x,z Subject to: Ax + Bz = c (5.5) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

163 Optimization Challenge Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

164 Solution Distributed optimization Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

165 Optimization Challenges For K > 2 tasks, you need carefully derive the whole optimization solution. Each step in each iteration is still a sub-optimization problem. Sometimes, it is already difficult to solve. This method is at most linear Convergence. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

166 Joint Graphical Lasso Example Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

167 JGL-group Lasso example Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

168 JGL solution updating Θ (i) Set the gradient to be 0, we can get the SVD part of the solution. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

169 JGL solution updating Z (i) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

170 An example for difficulty of ADMM Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

171 Summary We introduce the multi-task sggms problem. We introduce the challenges of the optimization for this problem. We introduce the ADMM method and its drawbacks. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

172 References I P. Danaher, P. Wang, and D. M. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VI August 4th, / 28

173 Joint Gaussian Graphical Model Series VII Multi-task sggms estimators Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia August 18th, 2017 Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

174 Road Map Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

175 Outline 1 Notation 2 Review 3 Multi-task Learning 4 Multi-task sggms 5 Multi-task sggms estimators Joint Graphical Lasso Directly learn the commonalities and differences among tasks Directly learn the differences between case and control Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

176 Notation Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

177 Notation X (i) The i-th data matrix Σ (i) The i-th covariance matrix. Ω (i) The i-th precision matrix. p The number of features. n i The number of samples in the i-th data matrix. K The number of tasks. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

178 Review Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

179 Review from last talk We introduce multi-task learning sparse Gaussian Graphical Models (sggms). We introduce the optimization chanllenges in the multi-task sggms. We introduce the ADMM method and the solution of Joint Graphical Lasso. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

180 Review of Gaussian Graphical Model Suppose the precision matrix Ω = Σ 1. ( ) The log-likelihood of Ω equals to ln det(ω) tr ΩŜ Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

181 Multi-task Learning Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

182 Multi-task Learning Multi-task Learning Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

183 Multi-task Learning Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

184 Multi-task Learning Linear Classifier Example Linear Classifier f (x) = sgn(w T x + b) (3.1) Multi-task Linear Classifiers For the i-th task, f i (x) = sgn((w T S + w T i )x + b) (3.2) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

185 Multi-task sggms Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

186 Multi-task sggms Problem Input: {X (i) } Output: {Ω (i) } Assumption I: Sparsity Assumption II: Commonalities and Differences Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

187 Multi-task sggms Likelihood i ( n i (ln det(ω (i) ) tr Ω (i) Ŝ (i)) ) (4.1) Likelihood with sparsity assumption argmax Ω (i) n i (ln det(ω (i) ) tr i ( Ω (i) Ŝ (i)) ) (4.2) Subject to: Ω (i) 1 t (4.3) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

188 Multi-task sggms Likelihood with multi-task setting argmax Ω (i) n i (ln det(ω (i) ) tr i ( Ω (i) Ŝ (i)) ) (4.4) Subject to: Ω (i) 1 t (4.5) P(Ω (1), Ω (2),..., Ω (K) ) t 2 (4.6) Joint Graphical Lasso [Danaher et al.(2013)danaher, Wang, and Witten] argmin Ω (i) i ( n i (ln det(ω (i) )+tr Ω (i) Ŝ (i)) )+λ 1 Ω (i) 1 +λ 2 P(Ω (1), Ω (2),..., Ω (4.7) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

189 Multi-task sggms estimators Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

190 Multi-task sggms estimators Joint Graphical Lasso type estimators Directly learn the commonalities and differences among tasks Directly learn the differences between case and control Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

191 Joint Graphical Lasso estimators Different Joint Graphical Lasso In the end, different multi-task sggms estimators choose different P(Ω (1), Ω (2),..., Ω (K) ). Solutions Most methods use ADMM as the solution of the estimators. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

192 JGL:Problem Input: {X (i) } Output: {Ω (i) } Assumption I: Sparsity Assumption II: Commonalities and Differences Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

193 Multi-task sggms estimators Group Lasso[Danaher et al.(2013)danaher, Wang, and Witten] P(Ω (1), Ω (2),..., Ω (K) ) = Ω (1), Ω (2),..., Ω (K) G,2. SIMONE[Chiquet et al.(2011)chiquet, Grandvalet, and Ambroise] P(Ω (1), Ω (2),..., Ω (K) ) = (( T i j (Ω (k) ij k=1 ) 2 +)) (( K ( Ω (k) k=1 ij ) 2 +)) 1 2. Node JGL[Mohan et al.(2013)mohan, London, Fazel, Lee, and Witten] P(Ω (1), Ω (2),..., Ω (K) ) = RCON(Ω (i) Ω (j) ). ij,i>j Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

194 Node JGL[Mohan et al.(2013)mohan, London, Fazel, Lee, and Wit Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

195 Directly learn the commonalities and differences among tasks: Problem Input: {X (i) } Output: {Ω (i) I, Ω S } Assumption I: Sparsity Assumption II: Commonalities and Differences Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

196 Multi-task sggms estimators Direct modeling The second penalty function is still an indirect way to model the commonality and differences among tasks. Some works try to directly model this relationship. Mixed Neighborhood Selection (MSN)[Monti et al.(2015)monti, Anagnostopoulos, and Montana] the neighborhood edges of a given node v in the i-task is modeled as β v + b (i),v. Here b (i),v N(0, Φ v ). Consider the CLIME estimator, we can directly model the graphs as the sum of commonality and differences SIMULE Ω (i) = ɛω S + Ω (i) I. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

197 Direct modeling commonalities and differences SIMULE SIMULE Ω (1) I, Subject to: Σ (i) (Ω (i) I (2) Ω I,..., Ω (K) I, Ω S = argmin Ω (i) I,Ω S + Ω S ) I λ n, i = 1,..., K i Ω (i) I 1 + ɛk Ω S 1 Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

198 Multi-task sggms estimators Direct modeling the differential networks: Problem Input: {X (i) } Output: { } Assumption I: Sparse Differential networks Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

199 Multi-task sggms estimators Direct modeling the differential networks I Fused GLasso By adding a regularization to enforce the sparsity of = Ω c Ω d, we have the following formulation: argmin L(Ω c ) + L(Ω d )λ n ( Ω c 1 + Ω d 1 ) + λ 2 1 (5.1) Ω c,ω d 0, The Fused Lasso assumes Ω case, Ω control,. However, many real world applications, like brain imaging data, only assume the differential network is sparse. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

200 Direct modeling the differential networks II: Differential CLIME A recent study proposes the following model, which only assume the sparsity of. Differential CLIME argmin 1 Subject to: Σ c Σ d ( Σ c Σ (5.2) d ) λ n However, this method is solved by a linear programming. It has p 2 variables in this method. Therefore, the time complexity is at least O(p 8 ). In practice, it takes more than 2 days to finish running the method when p = 120. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

201 Direct modeling the differential networks III: Density Ratio The above methods all make the Gaussian assumption. This method relaxes the model to the exponential family distribution. Density Ratio p c (x, θ c ) p d (x, θ d ) exp( t t f t (x)) (5.3) Here t encodes the difference between two Networks for factor f t. Density Ratio r(x; θ) = 1 N(θ) exp( t t f t (x)) (5.4) Here t encodes the difference between two Networks for factor f t. N(θ) is a normalization term. eilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

202 Direct modeling the differential networks III: Density Ratio Density Ratio for Markov Random Field KL[p c p] = Const. p(x) = p d (x)r(x; θ) p c (x) log r(x; θ)dx. (5.5) Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

203 Summary We introduce the multi-task sggms estimators. We introduce the multi-task sggms estimators, which directly model the commonalities and differences. We introduce the multi-task sggms estimators, which directly model the differences. Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

204 References I J. Chiquet, Y. Grandvalet, and C. Ambroise. Inferring multiple graphical structures. Statistics and Computing, 21(4): , P. Danaher, P. Wang, and D. M. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), K. Mohan, P. London, M. Fazel, S.-I. Lee, and D. Witten. Node-based learning of multiple gaussian graphical models. arxiv preprint arxiv: , Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

205 References II R. P. Monti, C. Anagnostopoulos, and G. Montana. Learning population and subject-specific brain connectivity networks via mixed neighborhood selection. arxiv preprint arxiv: , Beilun Wang, Advisor: Yanjun Qi (University of Joint Virginia) Gaussian Graphical Model Series VII August 18th, / 33

206 Joint Gaussian Graphical Model Series VIII A deep introduction of the metrics for evaluating an/a estimator/learner Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia Sep 22nd, 2017 Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

207 Road Map Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

208 Outline 1 Notation 2 Review 3 The metrics for evaluating an estimator 4 Statistical Convergence Rate 5 Optimization Convergence Rate 6 Computational Complexity Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

209 Notation Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

210 Notation X The data matrix Σ The covariance matrix. Ω The precision matrix. p The number of features. n The number of samples in the data matrix. s The number of non-zero entries in the precision matrix. Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

211 Review Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

212 Review from last talk We introduce different sggm estimators and their solution. eilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

213 Review from last talk We introduce different sggm estimators and their solution. We briefly introduce the three metrics used in evaluating an estimator. eilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

214 Review from last talk We introduce different sggm estimators and their solution. We briefly introduce the three metrics used in evaluating an estimator. We introduce different multi-task sggms estimators and their optimization challenges. Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

215 The metrics for evaluating an estimator Beilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

216 Motivation I: Select a proper estimator There may be a lot of similar estimators. eilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

217 Motivation I: Select a proper estimator There may be a lot of similar estimators. You need to decide which one to use. eilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

218 Motivation I: Select a proper estimator There may be a lot of similar estimators. You need to decide which one to use. You need some metrics to make the decision. eilun Wang, Advisor: Yanjun Qi (University Joint of Virginia) Gaussian Graphical Model Series VIII Sep 22nd, / 30

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

Structure Learning of Mixed Graphical Models

Structure Learning of Mixed Graphical Models Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Sparse Inverse Covariance Estimation for a Million Variables

Sparse Inverse Covariance Estimation for a Million Variables Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014 for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Multivariate Bernoulli Distribution 1

Multivariate Bernoulli Distribution 1 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

On the inconsistency of l 1 -penalised sparse precision matrix estimation

On the inconsistency of l 1 -penalised sparse precision matrix estimation On the inconsistency of l 1 -penalised sparse precision matrix estimation Otte Heinävaara Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki Janne

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information