TP, TN, FP and FN Tables for dierent methods in dierent parameters:

Size: px

Start display at page:

Download "TP, TN, FP and FN Tables for dierent methods in dierent parameters:"

Jody Burns
6 years ago
Views:

1 TP, TN, FP and FN Tables for dierent methods in dierent parameters: Zhu,Yunan January 11, 2015 Notes: The mean of TP/TN/FP/FN shows NaN means the original matrix is a zero matrix True P/N the bigger the better False P/N the smaller the better rho=seq(0.01,1, length=100) Dene better as closer to the true graph. The tables below denote the mean of elements in each True Positives/True Negatives/False Positives/False Negatives matrix corresponding to dierent parameter settings. lenboo is Bootstrap times pi is the threshold value for Bootstrap Glasso nset is the sample size in each simulation ndata is the total repeat times (times of simulation) 1

2 nset=100, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= pi= pi= huge+ric huge+stars huge+ebic

3 Comments on the changing results of Bootstrap Glasso: (Since sometimes the BIC T N, F P for Glasso obtained from running the Bootstrap Glasso codes alone (not together with Adaptive Lasso, SCAD and Huge) are dierent from the results running Bootstrap Glasso codes together with Adptive Lasso, SCAD and Huge.) For lenboo = 50, pi = 0.75, 0.8, 0.85, I didn't apply bootstrap Glasso alone. The bootstrap Glasso results directly come from the Glasso Series code (run together with Glasso, Adaptive Glasso and SCAD.) When lenboo = 50, both pi = 0.9 and 0.95, BIC T N and F P for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to , For lenboo = 100, BIC T N and F P under all pi for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to ,

4 nset=200, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original Glasso BIC T N and F P under pi = 0.75, 0.8, 0.85, 0.9, 0.95 are all unchanged (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 4

5 nset=500, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original GLASSO BIC T N and F P, all pi changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ). T N and F P for all the pis change to ,

6 nset=1000, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso : When lenboo = 100, BIC T N and F P for the Original Glasso under all pi = 0.75, 0.8, 0.85, 0.9, 0.95 don't change (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 6

7 Comments on behaviors: I used 4 criteria to evaluate performances of 9 methods: 4 criteria : 1. (Mean of ) Ture Positives: Êij λ 0 & Eλ ij 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Positives are the larger the better, where a True Positive indicates the frequency of estimating an existing edge correctly. A non-one True Positive indicates the estimated graph omit an existing edge in the true graph. The mean of True Positves equals to one signies the estimated graph contains all the edges in the true graph. The larger the denser and closer to the truth, the smaller the sparser. We can regard the (mean of) True Positives as the frequency of estimating a nonzero element as a nonzero element. 7

8 2. (Mean of ) True Negatives: Êij λ = 0 & Eλ ij = 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Negatives are the larger the better, where a True Negative indicates the frequency of estimating a non-existing edge correctly. The mean of True Negatives equal to one signies the estimated graph estimates all the nonexisting edges correctly, or means the estimated graph doesn't add any non-existing edges to the true graph. (not denser than the true graph) A non-one True Negative indicates adding a non-existing edge to the true graph. (denser) Larger mean of True Negatives indicates adding less wrong edges to the true graph. (sparser) The larger the sparser and closer to the truth, the smaller the denser. We can regard the (mean of ) True Negatives as the frequency of estimating an exactly zero element as exactly zero. 8

9 3. (Mean of ) False Positives: Êij λ 0 & Eλ ij = 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Positives are the smaller the better, where a False Positive indicates the frequency of estimating a non-existing edge wrongly. A nonzero False Positive indicates adding a non-existing edge to the true graph. Larger False Positives indicate estimating a denser graph than the true graph. (denser) The larger the denser, the smaller the sparser and closer to the truth. We can regrad the (mean of ) False Positives as the frequency of estimating an exactly zero element as nonzero. 4. (Mean of ) False Negatives: Êij λ = 0 & Eλ ij 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Negatives are the smaller the better, where a False Negative indicates the frequency of omitting an existing edge in the true graph. A nonzero False Negative indicates omitting an existing edge in the true graph. Larger False Negatives indicate estimating a sparser graph than the true graph. (sparser) The larger the sparser, the smaller the denser and closer to the truth. We can regard the (mean of) False Negatives as the frequency of estimating a nonzero element as exactly zero. 9

10 Remark: (the mean of ) True Negatives + (the mean of ) False Positives = 1, (the mean of ) True Positives + (the mean of ) False Negatives = 1, for each given true precision matrix. Inspired by the aforementioned properties of TP/TN/FP/FN, the criterion of a better method should not only be determined by the sparsity of estimated precision matrix we expect, but also the estimation accuracy we want. ( the sparsity of the true precision matrix itself would aect the choice of methods) In other words, one method may give the sparsest estimation but not the closest estimation to the truth. Analogously, a denser estimation may be more similar to the true graph. 10

11 9 methods: 1. Correlation matrix 2. Partial correlation matrix 3. Glasso 4. Glasso with Adaptive Lasso penalty 5. Glasso with SCAD penalty 6. Bootstrap Glasso 7. Huge with ric (rotation information criterion) 8. Huge with stars (stability approach to regularization selection) 9. Huge with ebic (extended Bayesian information criterion) 11

12 1. Computational speed: Glasso, Adaptive Lasso and SCAD are really fast methods even when increasing parameters (such as nset). Bootstrap Glasso is slower than the Glasso series methods due to resampling (aected mainly by lenboo). Incresing bootstrap times will signicantly slow down its computational speed. (Taking both computational speed and estimation improvement into account, I select lenboo = 100 when comparing Bootstrap Glasso to other methods.) ( Large lenboo will lead to very slow computational speed, whereas a too small lenboo may not suciently reect the advantage of Bootstrap Glasso) ( However, using lenboo = 100 here doesn't mean I am sure 100 is the best choice for lenboo.) What's more, I think lenboo should not be greater than nset. (When nset is small and resample more times, in each Bootstrap step, the new Bootstrap data is very likely not containing all the dierent original data, which could leads to a worse result than simply applying Glasso. What's worse, plus a larger lenboo than nset, it is very likely to reestimate many times using very bad resample data ( also the similarity among resample data in dierent time may be high). Consequently, the advantages of Bootstrap Glasso may be outperformed by the disadvantages when lenboo is (much) larger than nset.) Huge is the most slow method (aected mainly by nset). (The 'stars' criterion may be the slowest method due to the requirement of subsampling, which may be similar to the Bootstrap Glasso.) In terms of computation speed, the preference of methods are: Glasso, Adaptive Lasso, SCAD >Bootstrap Glasso > Huge (stars > ric, ebic). 12

13 2. Estimation accuracy (x ndata = 100, set nset = 100, 200, 500, 1000): where ndata is the simulation times, nset is the sample size in each simulation and pi is the threshold value for Bootstrap Glasso. Explanation in both numeric way and graphical way (values and edges) 2.1 The variations of methods' performances with the change of nset (from 100 to 1000) and the dierent behaviours of AIC, BIC and CVerror. 1. For Correlation matrix and, 2. Partial correlation matrix: When nset is increased from 100 to 1000, all the four criteria don't change at all: mean of T P remains at 1, mean of T N at 0, mean of F P at 1 and mean of F N at 0. Additionally, as we all know, we could still get relatively large nonzero estimated elements even if the true element is exactly zero. Overall, they are two rather poor methods to estimate the true precision matrix (true graph), which can be explained as: all the nonzero elements in the true precision matrix are estimated as nonzero. (all the existing edges will always be included in the estimated graph) all the zero elements in the true precision matrix are estimated as nonzero. (all the nonexisting edges will be included in the estimated graph) (far denser than the true graph) all the zero elements in the true precision matrix are estimated as nonzero. (same as above situation) all the nonzero elements in the true precision matrix are estimated as nonzero. (same as the rst situation) 13

14 3. For Glasso: The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero.) The mean of True Negatives is around 0.2 or 0.3 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset=100, BIC, CV > AIC (here AIC is short for the means of the True Negatives corresponding to AIC). When nset = 200, 500 BIC >CV >AIC. When nset = 1000, BIC >AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of True Negatives (the estimations for zero elements are exactly zero) The mean of False Positives is around 0.7 or 0.8 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset = 100, BIC, CV < AIC. When nset = 200, 500 BIC <CV <AIC. When nset = 1000, BIC <AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of False Positives (least often to estimate zero elements as nonzero). (This result will be consistent with the conclusion from True Negatives since (the mean of) TN +(the mean of ) FP = 1) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 4. For Glasso with Adaptive Lasso penalty The mean of True Positives is always 1. nonzero.) (The estimations for nonzero elements are always The mean of True Negatives is around 0.5 to 0.8 (for AIC, BIC and CVerror), and monotonically increases as nset increases. When nset = 100, CV >BIC >AIC (at around 0.65, 0.5 level). When nset = 200, BIC >CV >AIC (at around 0.67,0.52 level). When nset = 500, BIC >CV >AIC (at around 0.8,0.7 level.) When nset = 1000, BIC > CV, AIC (at around 0.85 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Positives is around 0.2 to 0.5 (for AIC, BIC and CVerror), and monotonically decreases as nset increases. When nset = 100, CV <BIC <AIC (at around 0.32, 0.5 level). When nset = 200, BIC<CV <AIC (at around 0.33,0.52 level). When nset = 500, BIC <CV <AIC (at around 0.2,0.3 level.) When nset = 1000, BIC < CV, AIC (at around 0.14,0.15 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 14

15 5. For Glasso with SCAD penalty The mean of True Positives is the biggest dierence for SCAD from other methods, in the sense that is is always not 1 for all AIC, BIC and CVerror (The estimation for nonzero elements are not always nonzero, which means this method may lead to a even sparser estimation than the truth, or the number of edges could be less than the true graph). Additionally, the mean of True Positives monotonically increases as nset increases.(at around to level, which never happens to other methods). When nset = 100, BIC, AIC >CV.When nset = 200,CV >AIC, BIC. When nset = 500,CV >AIC >BIC. When nset = 1000, AIC >CV >BIC. Whereas the dierences among AIC, BIC and CVerror are not big. The mean of True Negatives is around 0.7 to 0.97 level, and monotonically increases as nset increases (only one exception is BIC from nset = 500 to 1000, decreasing to ). When nset = 100, BIC >CV >AIC (at around 0.89,0.85 to 0.7 level) When nset = 200, BIC >CV >AIC (at around 0.90,0.86,0.76 level). When nset = 500, BIC >CV >AIC (at around 0.98,0.945,0.938 level.) When nset = 1000, BIC > CV>AIC (at around 0.97,0.96 level.) Therefore, BIC is the best criterion in terms of True Negatives. The mean of False Positives is around 0.29 to 0.02 level, and monotonically decreases as nset increases (one exception is BIC from nset = 500 to 1000, increasing to ). When nset = 100, BIC <CV <AIC (at around 0.105,0.14 to 0.29 level) When nset = 200, BIC <CV <AIC (at around 0.09,0.13,0.24 level). When nset = 500, BIC <CV <AIC (at around 0.018,0.05,0.06 level.) When nset = 1000, BIC < CV<AIC (at around0.02,0.04,0.05 level.) Therefore, BIC is the best criterion in terms of False Positives. The mean of False Negatives for SCAD is also very dierent from others. It is never 0 for dierent nset and for all AIC, BIC and CVerror. (The estimation for nonzero elements are sometimes zeros, which never happens to other methods.) (at around 0.01 to 0.16 level) When nset = 100, AIC = BIC <CV (at around 0.01,0.025 level). When nset = 200, CV <AIC = BIC (at around 0.03,0.035 level). When nset = 500, AIC <BIC <CV (at aroud 0.08,0.085,0.14 level). When nset = 1000, AIC <CV <BIC (at around 0.13,0.14,0.165 level). The dierences for each nset among AIC, BIC and CVerror are not big. But the increasing pattern for the mean of False Negatives is clear (increases with nset). So maybe we could consider AIC as the best criterion based on False Negatives. 15

16 6. For Bootstrap Glasso The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives increases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of True Negatives is not very clear since I only run lenboo = 50,100 for nset = 100, 200 and lenboo = 100 for the rest nset. (Only according to these two settings, the results look like when xing nset and pi, the mean of True Positives doesn't have an obvious variation pattern with lenboo.) For each pi and lenboo, rst the mean of True Negatives decreases with nset when it increase from 100 to 200, then the mean of True Negatives increases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror ). As for the comparision of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC >CV >AIC, whereas when pi = 0.85, 0.9, 0.95, BIC >AIC>CV. But for nset = 1000, for pi = , we have BIC >CV>AIC, for pi = 0.95, we have BIC >AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on True Negatives. The mean of False Positives decreases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of False Positives is not shown. For each pi and lenboo, rst the mean of False Positives increases with nset when it increases from 100 to 200, then the mean of False Positives decreases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror). As for the comparison of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC <CV <AIC, whereas when pi = 0.85, 0.9, 0.95, BIC<AIC <CV. But for nset = 1000, for pi = , BIC <CV <AIC, for pi = 0.95, we have BIC <AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on False Positives. Based on my previous work and the tables shown above, I think the power of improvement for True Negatives/False Positives are pi >nset >lenboo, which has yet taken the enormous time consumed by using a relatively large lenboo into account. (Therefore, I will suggest using a resonably large pi and nset, and leave a not too small lenboo, such as 100.) (But there is still one question remaining to be tested and conrmed : whether lenboo should be objectively large or small itself, or it should be relatively large or small, which means actually the ratio lenboo/nset matters?) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 16

17 The following Huge results are based on the codes that run Huge alone (not together with Glasso series, or say before Bootstrap Glasso) 7. For Huge with ric (rotation information criterion) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.302, 0.255, , 0.35, for nset = 100, 200, 500, 1000 respecively. ) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. ( , 0.83, , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 8. For Huge with stars (stability approach to regularization selection) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives has a monotonically increasing pattern as nset increases from 100 to ( , 0.29, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Positives has a monotonically decreasing pattern as nset increases from 100 to ( , 0.71, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Negatives is always 0. (The estimations for nonzero elements are never exactly zero) 9. For Huge with ebic (extended Bayesian information criterion) The mean of True Positives is always 1. nonzero) (The estimations for nonzero elements are always The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.4, , , , for nset = 100, 200, 500, 1000 respectively.) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.6, , , , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 17

18 2.2 Comparison among methods. based on True Negatives/False Positives. > means better ( larger True Negatives/ smaller False Positives ) When nset = 100, Boo Glasso + pi (0.95) + BIC > SCAD + BIC > Boo Glasso + pi(0.95) + AIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi(0.85) + BIC > SCAD + AIC >Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + CV > Adaptive Lasso + BIC = Boo Glasso + pi(0.8)+ BIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Huge + ebic > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.75) + CV = Glasso + CV > Glasso + BIC > Huge + ric > Boo Glasso + pi(0.8) + AIC > Huge+stars > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 200, Boo Glasso + pi (0.95)+ BIC > SCAD + BIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > SCAD + AIC > Boo Glasso + pi(0.85) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + CV > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + AIC > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.75) + BIC > Huge + ebic > Boo Glasso + pi(0.85) + AIC > Glasso + BIC > Huge + stars > Boo Glasso + pi(0.85) + CV >Huge + ric > Glasso + CV = Boo Glasso + pi(0.75) + CV = Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 500, SCAD+BIC > Boo Glasso+ pi (0.95) + BIC > SCAD + CV > SCAD + AIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.85) + CV > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric 18

19 > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Glasso + pi(0.75) + CV > Glasso + CV > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 1000, SCAD+BIC > SCAD+CV > SCAD + AIC > Boo Glasso + pi (0.95) + BIC > Boo Glasso + pi (0.95) + CV = Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + AIC = Adaptive Lasso + CV > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric > Glasso + pi(0.75) + CV > Boo Glasso + pi(0.75) +AIC > Glasso + CV > Glasso + AIC > (Partial) Correlation matrix. 19

20 Remarks: According to the table above, when nset = 1000, ndata = 100, even if lenboo increases to 200 and set pi = 0.95, SCAD still perfroms better than Bootstrap Glasso. Astonishingly, when nset is small, such as 100, lenboo = 50 performs better than lenboo = 100 or 200. But when nset = 200, lenboo = 200 performs better than lenboo = 50 and 100. Therefore, I think lenboo at the largest should not be greater than nset. BIC gives the best estimation compared to AIC and CVerror most of the time. The choice of BIC could oset method shortcomings and speed inconvenience due to large paramters to a certain degree. So suprisingly, sometimes, using BIC is even a better choice than changing to another better method or changing the value of parameters, which means without large pi, simply by using BIC criterion, we could still get good estimation. It seems when nset is large; pi is large and lenboo is not too small, AIC has the same True Negatives/False Positives as CV. It seems when nset is relatively large (500 or 1000), Huge+stars has the same True Negatives/False Positives as Huge +ebic. For large nset, SCAD is more ecient than large pi Bootstrap Glasso. (even SCAD+CV could be better than Boo Glasso +pi(0.95) +BIC after we thought BIC is the best criterion). SCAD and Adaptive Lasso starts to triumph over Bootstrap Glasso with large pi. Original Glasso +BIC could also outperform Bootstrap Glasso +pi(0.75) +CV, AIC (arms the superiority of BIC over AIC and CV). 20

21 3. Questions and Problems: 1. When running Adaptive Lasso penalty and SCAD penalty codes in R, I always get warning messages like these: 1: In lamhat <= lam : longer object length is not a multiple of shorter object length 2: In a * lam - lamhat : longer object length is not a multiple of shorter object length 3: In lamhat > lam : longer object length is not a multiple of shorter object length 4: In pmax(a * lam - lamhat, 0) * (lamhat > lam)/(a - 1)/lam : longer object length is not a multiple of shorter object length 5: In lam * ((lamhat <= lam) + pmax(a * lam - lamhat, 0) *... : longer object length is not a multiple of shorter object length 2. BIC T N andf P for Bootstrap Glasso are always slightly dierent when conduct Bootstrap Glasso code alone from conducting it together with Glasso, Adaptive Glasso and SCAD. These two ways of calculation will SOMETIMES give very few dierent estimated precision matrices, which will not aect AIC, CV error and BICT P, F N results, but only BIC T N and F P for the Original Glasso (not the Bootstrap Glasso). ( I DON'T KNOW WHY! Are they errors in an allowable range?) Just a slight dierence in the means of T N and F P, on the order of around 0.001(0.004 when nset = 100 and when nset = 500) 3. Even using the same sample data ( xx, equivalent to x j, the simulation time), run Huge + ric several times will give dierent results!! on the order of (I DON'T KNOW WHY!) 4. May due to Problem 2, as long as applying Huge AFTER Bootstrap Glasso, the results for Huge will change ( eg, getting result b). 5. Applying Huge after correlation matrix, partial correlation matrix, Glasso, Adaptive Glasso and SCAD will not change Huge results (eg, getting result a ). 6. Applying Huge before Bootstrap Glasso will not change Huge results either (eg, getting result a). 7. Therefore, I prefer result a, which is obtained by the most of the time. Additionally, I think it is more reliable since it is obtained directly from the original data in the rst run, which means if we set.seed again and get the same data again, without doing any other commands, we will get Huge result a ( and results will start to change if we don't set the same seed again before applying Huge). 21

22 Appendix nset=200, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= huge+ric huge+stars huge+ebic

23 when incresing simulation times: nset=100, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic nset=100, ndata=500 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic

24 nset=100, ndata=1000 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic

Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS

Estimating Sparse Graphical Models: Insights Through Simulation by Yunan Zhu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in STATISTICS Department of