TP, TN, FP and FN Tables for dierent methods in dierent parameters:

Size: px
Start display at page:

Download "TP, TN, FP and FN Tables for dierent methods in dierent parameters:"

Transcription

1 TP, TN, FP and FN Tables for dierent methods in dierent parameters: Zhu,Yunan January 11, 2015 Notes: The mean of TP/TN/FP/FN shows NaN means the original matrix is a zero matrix True P/N the bigger the better False P/N the smaller the better rho=seq(0.01,1, length=100) Dene better as closer to the true graph. The tables below denote the mean of elements in each True Positives/True Negatives/False Positives/False Negatives matrix corresponding to dierent parameter settings. lenboo is Bootstrap times pi is the threshold value for Bootstrap Glasso nset is the sample size in each simulation ndata is the total repeat times (times of simulation) 1

2 nset=100, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= pi= pi= huge+ric huge+stars huge+ebic

3 Comments on the changing results of Bootstrap Glasso: (Since sometimes the BIC T N, F P for Glasso obtained from running the Bootstrap Glasso codes alone (not together with Adaptive Lasso, SCAD and Huge) are dierent from the results running Bootstrap Glasso codes together with Adptive Lasso, SCAD and Huge.) For lenboo = 50, pi = 0.75, 0.8, 0.85, I didn't apply bootstrap Glasso alone. The bootstrap Glasso results directly come from the Glasso Series code (run together with Glasso, Adaptive Glasso and SCAD.) When lenboo = 50, both pi = 0.9 and 0.95, BIC T N and F P for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to , For lenboo = 100, BIC T N and F P under all pi for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to ,

4 nset=200, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original Glasso BIC T N and F P under pi = 0.75, 0.8, 0.85, 0.9, 0.95 are all unchanged (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 4

5 nset=500, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original GLASSO BIC T N and F P, all pi changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ). T N and F P for all the pis change to ,

6 nset=1000, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso : When lenboo = 100, BIC T N and F P for the Original Glasso under all pi = 0.75, 0.8, 0.85, 0.9, 0.95 don't change (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 6

7 Comments on behaviors: I used 4 criteria to evaluate performances of 9 methods: 4 criteria : 1. (Mean of ) Ture Positives: Êij λ 0 & Eλ ij 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Positives are the larger the better, where a True Positive indicates the frequency of estimating an existing edge correctly. A non-one True Positive indicates the estimated graph omit an existing edge in the true graph. The mean of True Positves equals to one signies the estimated graph contains all the edges in the true graph. The larger the denser and closer to the truth, the smaller the sparser. We can regard the (mean of) True Positives as the frequency of estimating a nonzero element as a nonzero element. 7

8 2. (Mean of ) True Negatives: Êij λ = 0 & Eλ ij = 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Negatives are the larger the better, where a True Negative indicates the frequency of estimating a non-existing edge correctly. The mean of True Negatives equal to one signies the estimated graph estimates all the nonexisting edges correctly, or means the estimated graph doesn't add any non-existing edges to the true graph. (not denser than the true graph) A non-one True Negative indicates adding a non-existing edge to the true graph. (denser) Larger mean of True Negatives indicates adding less wrong edges to the true graph. (sparser) The larger the sparser and closer to the truth, the smaller the denser. We can regard the (mean of ) True Negatives as the frequency of estimating an exactly zero element as exactly zero. 8

9 3. (Mean of ) False Positives: Êij λ 0 & Eλ ij = 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Positives are the smaller the better, where a False Positive indicates the frequency of estimating a non-existing edge wrongly. A nonzero False Positive indicates adding a non-existing edge to the true graph. Larger False Positives indicate estimating a denser graph than the true graph. (denser) The larger the denser, the smaller the sparser and closer to the truth. We can regrad the (mean of ) False Positives as the frequency of estimating an exactly zero element as nonzero. 4. (Mean of ) False Negatives: Êij λ = 0 & Eλ ij 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Negatives are the smaller the better, where a False Negative indicates the frequency of omitting an existing edge in the true graph. A nonzero False Negative indicates omitting an existing edge in the true graph. Larger False Negatives indicate estimating a sparser graph than the true graph. (sparser) The larger the sparser, the smaller the denser and closer to the truth. We can regard the (mean of) False Negatives as the frequency of estimating a nonzero element as exactly zero. 9

10 Remark: (the mean of ) True Negatives + (the mean of ) False Positives = 1, (the mean of ) True Positives + (the mean of ) False Negatives = 1, for each given true precision matrix. Inspired by the aforementioned properties of TP/TN/FP/FN, the criterion of a better method should not only be determined by the sparsity of estimated precision matrix we expect, but also the estimation accuracy we want. ( the sparsity of the true precision matrix itself would aect the choice of methods) In other words, one method may give the sparsest estimation but not the closest estimation to the truth. Analogously, a denser estimation may be more similar to the true graph. 10

11 9 methods: 1. Correlation matrix 2. Partial correlation matrix 3. Glasso 4. Glasso with Adaptive Lasso penalty 5. Glasso with SCAD penalty 6. Bootstrap Glasso 7. Huge with ric (rotation information criterion) 8. Huge with stars (stability approach to regularization selection) 9. Huge with ebic (extended Bayesian information criterion) 11

12 1. Computational speed: Glasso, Adaptive Lasso and SCAD are really fast methods even when increasing parameters (such as nset). Bootstrap Glasso is slower than the Glasso series methods due to resampling (aected mainly by lenboo). Incresing bootstrap times will signicantly slow down its computational speed. (Taking both computational speed and estimation improvement into account, I select lenboo = 100 when comparing Bootstrap Glasso to other methods.) ( Large lenboo will lead to very slow computational speed, whereas a too small lenboo may not suciently reect the advantage of Bootstrap Glasso) ( However, using lenboo = 100 here doesn't mean I am sure 100 is the best choice for lenboo.) What's more, I think lenboo should not be greater than nset. (When nset is small and resample more times, in each Bootstrap step, the new Bootstrap data is very likely not containing all the dierent original data, which could leads to a worse result than simply applying Glasso. What's worse, plus a larger lenboo than nset, it is very likely to reestimate many times using very bad resample data ( also the similarity among resample data in dierent time may be high). Consequently, the advantages of Bootstrap Glasso may be outperformed by the disadvantages when lenboo is (much) larger than nset.) Huge is the most slow method (aected mainly by nset). (The 'stars' criterion may be the slowest method due to the requirement of subsampling, which may be similar to the Bootstrap Glasso.) In terms of computation speed, the preference of methods are: Glasso, Adaptive Lasso, SCAD >Bootstrap Glasso > Huge (stars > ric, ebic). 12

13 2. Estimation accuracy (x ndata = 100, set nset = 100, 200, 500, 1000): where ndata is the simulation times, nset is the sample size in each simulation and pi is the threshold value for Bootstrap Glasso. Explanation in both numeric way and graphical way (values and edges) 2.1 The variations of methods' performances with the change of nset (from 100 to 1000) and the dierent behaviours of AIC, BIC and CVerror. 1. For Correlation matrix and, 2. Partial correlation matrix: When nset is increased from 100 to 1000, all the four criteria don't change at all: mean of T P remains at 1, mean of T N at 0, mean of F P at 1 and mean of F N at 0. Additionally, as we all know, we could still get relatively large nonzero estimated elements even if the true element is exactly zero. Overall, they are two rather poor methods to estimate the true precision matrix (true graph), which can be explained as: all the nonzero elements in the true precision matrix are estimated as nonzero. (all the existing edges will always be included in the estimated graph) all the zero elements in the true precision matrix are estimated as nonzero. (all the nonexisting edges will be included in the estimated graph) (far denser than the true graph) all the zero elements in the true precision matrix are estimated as nonzero. (same as above situation) all the nonzero elements in the true precision matrix are estimated as nonzero. (same as the rst situation) 13

14 3. For Glasso: The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero.) The mean of True Negatives is around 0.2 or 0.3 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset=100, BIC, CV > AIC (here AIC is short for the means of the True Negatives corresponding to AIC). When nset = 200, 500 BIC >CV >AIC. When nset = 1000, BIC >AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of True Negatives (the estimations for zero elements are exactly zero) The mean of False Positives is around 0.7 or 0.8 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset = 100, BIC, CV < AIC. When nset = 200, 500 BIC <CV <AIC. When nset = 1000, BIC <AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of False Positives (least often to estimate zero elements as nonzero). (This result will be consistent with the conclusion from True Negatives since (the mean of) TN +(the mean of ) FP = 1) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 4. For Glasso with Adaptive Lasso penalty The mean of True Positives is always 1. nonzero.) (The estimations for nonzero elements are always The mean of True Negatives is around 0.5 to 0.8 (for AIC, BIC and CVerror), and monotonically increases as nset increases. When nset = 100, CV >BIC >AIC (at around 0.65, 0.5 level). When nset = 200, BIC >CV >AIC (at around 0.67,0.52 level). When nset = 500, BIC >CV >AIC (at around 0.8,0.7 level.) When nset = 1000, BIC > CV, AIC (at around 0.85 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Positives is around 0.2 to 0.5 (for AIC, BIC and CVerror), and monotonically decreases as nset increases. When nset = 100, CV <BIC <AIC (at around 0.32, 0.5 level). When nset = 200, BIC<CV <AIC (at around 0.33,0.52 level). When nset = 500, BIC <CV <AIC (at around 0.2,0.3 level.) When nset = 1000, BIC < CV, AIC (at around 0.14,0.15 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 14

15 5. For Glasso with SCAD penalty The mean of True Positives is the biggest dierence for SCAD from other methods, in the sense that is is always not 1 for all AIC, BIC and CVerror (The estimation for nonzero elements are not always nonzero, which means this method may lead to a even sparser estimation than the truth, or the number of edges could be less than the true graph). Additionally, the mean of True Positives monotonically increases as nset increases.(at around to level, which never happens to other methods). When nset = 100, BIC, AIC >CV.When nset = 200,CV >AIC, BIC. When nset = 500,CV >AIC >BIC. When nset = 1000, AIC >CV >BIC. Whereas the dierences among AIC, BIC and CVerror are not big. The mean of True Negatives is around 0.7 to 0.97 level, and monotonically increases as nset increases (only one exception is BIC from nset = 500 to 1000, decreasing to ). When nset = 100, BIC >CV >AIC (at around 0.89,0.85 to 0.7 level) When nset = 200, BIC >CV >AIC (at around 0.90,0.86,0.76 level). When nset = 500, BIC >CV >AIC (at around 0.98,0.945,0.938 level.) When nset = 1000, BIC > CV>AIC (at around 0.97,0.96 level.) Therefore, BIC is the best criterion in terms of True Negatives. The mean of False Positives is around 0.29 to 0.02 level, and monotonically decreases as nset increases (one exception is BIC from nset = 500 to 1000, increasing to ). When nset = 100, BIC <CV <AIC (at around 0.105,0.14 to 0.29 level) When nset = 200, BIC <CV <AIC (at around 0.09,0.13,0.24 level). When nset = 500, BIC <CV <AIC (at around 0.018,0.05,0.06 level.) When nset = 1000, BIC < CV<AIC (at around0.02,0.04,0.05 level.) Therefore, BIC is the best criterion in terms of False Positives. The mean of False Negatives for SCAD is also very dierent from others. It is never 0 for dierent nset and for all AIC, BIC and CVerror. (The estimation for nonzero elements are sometimes zeros, which never happens to other methods.) (at around 0.01 to 0.16 level) When nset = 100, AIC = BIC <CV (at around 0.01,0.025 level). When nset = 200, CV <AIC = BIC (at around 0.03,0.035 level). When nset = 500, AIC <BIC <CV (at aroud 0.08,0.085,0.14 level). When nset = 1000, AIC <CV <BIC (at around 0.13,0.14,0.165 level). The dierences for each nset among AIC, BIC and CVerror are not big. But the increasing pattern for the mean of False Negatives is clear (increases with nset). So maybe we could consider AIC as the best criterion based on False Negatives. 15

16 6. For Bootstrap Glasso The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives increases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of True Negatives is not very clear since I only run lenboo = 50,100 for nset = 100, 200 and lenboo = 100 for the rest nset. (Only according to these two settings, the results look like when xing nset and pi, the mean of True Positives doesn't have an obvious variation pattern with lenboo.) For each pi and lenboo, rst the mean of True Negatives decreases with nset when it increase from 100 to 200, then the mean of True Negatives increases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror ). As for the comparision of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC >CV >AIC, whereas when pi = 0.85, 0.9, 0.95, BIC >AIC>CV. But for nset = 1000, for pi = , we have BIC >CV>AIC, for pi = 0.95, we have BIC >AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on True Negatives. The mean of False Positives decreases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of False Positives is not shown. For each pi and lenboo, rst the mean of False Positives increases with nset when it increases from 100 to 200, then the mean of False Positives decreases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror). As for the comparison of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC <CV <AIC, whereas when pi = 0.85, 0.9, 0.95, BIC<AIC <CV. But for nset = 1000, for pi = , BIC <CV <AIC, for pi = 0.95, we have BIC <AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on False Positives. Based on my previous work and the tables shown above, I think the power of improvement for True Negatives/False Positives are pi >nset >lenboo, which has yet taken the enormous time consumed by using a relatively large lenboo into account. (Therefore, I will suggest using a resonably large pi and nset, and leave a not too small lenboo, such as 100.) (But there is still one question remaining to be tested and conrmed : whether lenboo should be objectively large or small itself, or it should be relatively large or small, which means actually the ratio lenboo/nset matters?) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 16

17 The following Huge results are based on the codes that run Huge alone (not together with Glasso series, or say before Bootstrap Glasso) 7. For Huge with ric (rotation information criterion) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.302, 0.255, , 0.35, for nset = 100, 200, 500, 1000 respecively. ) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. ( , 0.83, , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 8. For Huge with stars (stability approach to regularization selection) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives has a monotonically increasing pattern as nset increases from 100 to ( , 0.29, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Positives has a monotonically decreasing pattern as nset increases from 100 to ( , 0.71, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Negatives is always 0. (The estimations for nonzero elements are never exactly zero) 9. For Huge with ebic (extended Bayesian information criterion) The mean of True Positives is always 1. nonzero) (The estimations for nonzero elements are always The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.4, , , , for nset = 100, 200, 500, 1000 respectively.) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.6, , , , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 17

18 2.2 Comparison among methods. based on True Negatives/False Positives. > means better ( larger True Negatives/ smaller False Positives ) When nset = 100, Boo Glasso + pi (0.95) + BIC > SCAD + BIC > Boo Glasso + pi(0.95) + AIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi(0.85) + BIC > SCAD + AIC >Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + CV > Adaptive Lasso + BIC = Boo Glasso + pi(0.8)+ BIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Huge + ebic > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.75) + CV = Glasso + CV > Glasso + BIC > Huge + ric > Boo Glasso + pi(0.8) + AIC > Huge+stars > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 200, Boo Glasso + pi (0.95)+ BIC > SCAD + BIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > SCAD + AIC > Boo Glasso + pi(0.85) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + CV > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + AIC > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.75) + BIC > Huge + ebic > Boo Glasso + pi(0.85) + AIC > Glasso + BIC > Huge + stars > Boo Glasso + pi(0.85) + CV >Huge + ric > Glasso + CV = Boo Glasso + pi(0.75) + CV = Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 500, SCAD+BIC > Boo Glasso+ pi (0.95) + BIC > SCAD + CV > SCAD + AIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.85) + CV > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric 18

19 > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Glasso + pi(0.75) + CV > Glasso + CV > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 1000, SCAD+BIC > SCAD+CV > SCAD + AIC > Boo Glasso + pi (0.95) + BIC > Boo Glasso + pi (0.95) + CV = Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + AIC = Adaptive Lasso + CV > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric > Glasso + pi(0.75) + CV > Boo Glasso + pi(0.75) +AIC > Glasso + CV > Glasso + AIC > (Partial) Correlation matrix. 19

20 Remarks: According to the table above, when nset = 1000, ndata = 100, even if lenboo increases to 200 and set pi = 0.95, SCAD still perfroms better than Bootstrap Glasso. Astonishingly, when nset is small, such as 100, lenboo = 50 performs better than lenboo = 100 or 200. But when nset = 200, lenboo = 200 performs better than lenboo = 50 and 100. Therefore, I think lenboo at the largest should not be greater than nset. BIC gives the best estimation compared to AIC and CVerror most of the time. The choice of BIC could oset method shortcomings and speed inconvenience due to large paramters to a certain degree. So suprisingly, sometimes, using BIC is even a better choice than changing to another better method or changing the value of parameters, which means without large pi, simply by using BIC criterion, we could still get good estimation. It seems when nset is large; pi is large and lenboo is not too small, AIC has the same True Negatives/False Positives as CV. It seems when nset is relatively large (500 or 1000), Huge+stars has the same True Negatives/False Positives as Huge +ebic. For large nset, SCAD is more ecient than large pi Bootstrap Glasso. (even SCAD+CV could be better than Boo Glasso +pi(0.95) +BIC after we thought BIC is the best criterion). SCAD and Adaptive Lasso starts to triumph over Bootstrap Glasso with large pi. Original Glasso +BIC could also outperform Bootstrap Glasso +pi(0.75) +CV, AIC (arms the superiority of BIC over AIC and CV). 20

21 3. Questions and Problems: 1. When running Adaptive Lasso penalty and SCAD penalty codes in R, I always get warning messages like these: 1: In lamhat <= lam : longer object length is not a multiple of shorter object length 2: In a * lam - lamhat : longer object length is not a multiple of shorter object length 3: In lamhat > lam : longer object length is not a multiple of shorter object length 4: In pmax(a * lam - lamhat, 0) * (lamhat > lam)/(a - 1)/lam : longer object length is not a multiple of shorter object length 5: In lam * ((lamhat <= lam) + pmax(a * lam - lamhat, 0) *... : longer object length is not a multiple of shorter object length 2. BIC T N andf P for Bootstrap Glasso are always slightly dierent when conduct Bootstrap Glasso code alone from conducting it together with Glasso, Adaptive Glasso and SCAD. These two ways of calculation will SOMETIMES give very few dierent estimated precision matrices, which will not aect AIC, CV error and BICT P, F N results, but only BIC T N and F P for the Original Glasso (not the Bootstrap Glasso). ( I DON'T KNOW WHY! Are they errors in an allowable range?) Just a slight dierence in the means of T N and F P, on the order of around 0.001(0.004 when nset = 100 and when nset = 500) 3. Even using the same sample data ( xx, equivalent to x j, the simulation time), run Huge + ric several times will give dierent results!! on the order of (I DON'T KNOW WHY!) 4. May due to Problem 2, as long as applying Huge AFTER Bootstrap Glasso, the results for Huge will change ( eg, getting result b). 5. Applying Huge after correlation matrix, partial correlation matrix, Glasso, Adaptive Glasso and SCAD will not change Huge results (eg, getting result a ). 6. Applying Huge before Bootstrap Glasso will not change Huge results either (eg, getting result a). 7. Therefore, I prefer result a, which is obtained by the most of the time. Additionally, I think it is more reliable since it is obtained directly from the original data in the rst run, which means if we set.seed again and get the same data again, without doing any other commands, we will get Huge result a ( and results will start to change if we don't set the same seed again before applying Huge). 21

22 Appendix nset=200, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= huge+ric huge+stars huge+ebic

23 when incresing simulation times: nset=100, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic nset=100, ndata=500 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic

24 nset=100, ndata=1000 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic

Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS

Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS Estimating Sparse Graphical Models: Insights Through Simulation by Yunan Zhu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in STATISTICS Department of

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Model selection in penalized Gaussian graphical models

Model selection in penalized Gaussian graphical models University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst January 2014 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume

More information

Online Appendix to: Optimizing policymakers' loss functions in crisis prediction: before, within or after?

Online Appendix to: Optimizing policymakers' loss functions in crisis prediction: before, within or after? Online Appendix to: Optimizing policymakers' loss functions in crisis prediction: before, within or after? Peter Sarlin a,b, Gregor von Schweinitz c,d, a Department of Economics at Hanken School of Economics

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

PRIME GENERATING LUCAS SEQUENCES

PRIME GENERATING LUCAS SEQUENCES PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1 PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of

More information

Marginal Functions and Approximation

Marginal Functions and Approximation UCSC AMS/ECON 11A Supplemental Notes # 5 Marginal Functions and Approximation c 2006 Yonatan Katznelson 1. The approximation formula If y = f (x) is a dierentiable function then its derivative, y 0 = f

More information

Rate-Monotonic Scheduling with variable. execution time and period. October 31, Abstract

Rate-Monotonic Scheduling with variable. execution time and period. October 31, Abstract Rate-Monotonic Scheduling with variable execution time and period Oldeld Peter October 31, 1997 Abstract Abstract is something cannot be understood. 1 Rate Monotonic Model Let Ti be a task. Let Pi be the

More information

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit Ordinary Least Squares (OLS): Multiple Linear egression (ML) Assessment I What s New? & Goodness-of-Fit Introduction OLS: A Quick Comparison of SL and ML Assessment Not much that's new! ML Goodness-of-Fit:

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Clustering of Amino Acids Proles

Clustering of Amino Acids Proles Clustering of Amino Acids Proles Samuele Zoppi University Of Zurich, Switzerland samuele.zoppi@uzh.ch 1 Introduction Classifying feed data can be very challenging. When a feed sample is analyzed, an amino

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

2 Nils Andersson and Kostas D. Kokkotas Moreover, the w-mode spectra are qualitatively similar for axial and polar perturbations (for a description of

2 Nils Andersson and Kostas D. Kokkotas Moreover, the w-mode spectra are qualitatively similar for axial and polar perturbations (for a description of Mon. Not. R. Astron. Soc. 000, 000{000 (1997) Pulsation modes for increasingly relativistic polytropes Nils Andersson 1 and Kostas D. Kokkotas 2 1 Department of Physics, Washington University, St Louis

More information

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract What Every Programmer Should Know About Floating-Point Arithmetic Last updated: November 3, 2014 Abstract The article provides simple answers to the common recurring questions of novice programmers about

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim Tests for trend in more than one repairable system. Jan Terje Kvaly Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim ABSTRACT: If failure time data from several

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Achilles: Now I know how powerful computers are going to become!

Achilles: Now I know how powerful computers are going to become! A Sigmoid Dialogue By Anders Sandberg Achilles: Now I know how powerful computers are going to become! Tortoise: How? Achilles: I did curve fitting to Moore s law. I know you are going to object that technological

More information

Exponential Functions and Graphs - Grade 11 *

Exponential Functions and Graphs - Grade 11 * OpenStax-CNX module: m30856 1 Exponential Functions and Graphs - Grade 11 * Rory Adams Free High School Science Texts Project Heather Williams This work is produced by OpenStax-CNX and licensed under the

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

Convergence Complexity of Optimistic Rate Based Flow. Control Algorithms. Computer Science Department, Tel-Aviv University, Israel

Convergence Complexity of Optimistic Rate Based Flow. Control Algorithms. Computer Science Department, Tel-Aviv University, Israel Convergence Complexity of Optimistic Rate Based Flow Control Algorithms Yehuda Afek y Yishay Mansour z Zvi Ostfeld x Computer Science Department, Tel-Aviv University, Israel 69978. December 12, 1997 Abstract

More information

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

More information

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy August 25, 2017 A group of residents each needs a residency in some hospital. A group of hospitals each need some number (one

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

MITOCW watch?v=t6tqhnxy5wg

MITOCW watch?v=t6tqhnxy5wg MITOCW watch?v=t6tqhnxy5wg PROFESSOR: So what are we trying to do? We're going to try to write a matter wave. We have a particle with energy e and momentum p. e is equal to h bar omega. So you can get

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018

More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018 CS 61B More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018 Here is a review of some formulas that you will find useful when doing asymptotic analysis. ˆ N i=1 i = 1 + 2 + 3 + 4 + + N = N(N+1)

More information

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02) Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Math 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv

Math 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv Math 1270 Honors ODE I Fall, 2008 Class notes # 1 We have learned how to study nonlinear systems x 0 = F (x; y) y 0 = G (x; y) (1) by linearizing around equilibrium points. If (x 0 ; y 0 ) is an equilibrium

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Of small numbers with big influence The Sum Of Squares

Of small numbers with big influence The Sum Of Squares Of small numbers with big influence The Sum Of Squares Dr. Peter Paul Heym Sum Of Squares Often, the small things make the biggest difference in life. Sometimes these things we do not recognise at first

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

ε ε

ε ε The 8th International Conference on Computer Vision, July, Vancouver, Canada, Vol., pp. 86{9. Motion Segmentation by Subspace Separation and Model Selection Kenichi Kanatani Department of Information Technology,

More information

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.

More information

Essentials of Intermediate Algebra

Essentials of Intermediate Algebra Essentials of Intermediate Algebra BY Tom K. Kim, Ph.D. Peninsula College, WA Randy Anderson, M.S. Peninsula College, WA 9/24/2012 Contents 1 Review 1 2 Rules of Exponents 2 2.1 Multiplying Two Exponentials

More information

Announcements. Problem Set 1 out. Checkpoint due Monday, September 30. Remaining problems due Friday, October 4.

Announcements. Problem Set 1 out. Checkpoint due Monday, September 30. Remaining problems due Friday, October 4. Indirect Proofs Announcements Problem Set 1 out. Checkpoint due Monday, September 30. Grade determined by attempt rather than accuracy. It's okay to make mistakes we want you to give it your best effort,

More information

An algorithm for solving the graph isomorphism problem

An algorithm for solving the graph isomorphism problem An algorithm for solving the graph isomorphism problem By Lucas Allen Contents -introduction -The problem -The algorithm -Complexity -Examples *Example 1 *Example 2 *Example 3 *Example 4 -Conclusion Introduction

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10 Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

MITOCW watch?v=wr88_vzfcx4

MITOCW watch?v=wr88_vzfcx4 MITOCW watch?v=wr88_vzfcx4 PROFESSOR: So we're building this story. We had the photoelectric effect. But at this moment, Einstein, in the same year that he was talking about general relativity, he came

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

An Attempt To Understand Tilling Approach In proving The Littlewood Conjecture

An Attempt To Understand Tilling Approach In proving The Littlewood Conjecture An Attempt To Understand Tilling Approach In proving The Littlewood Conjecture 1 Final Report For Math 899- Dr. Cheung Amira Alkeswani Dr. Cheung I really appreciate your acceptance in adding me to your

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week

More information

Chapter 11 - Sequences and Series

Chapter 11 - Sequences and Series Calculus and Analytic Geometry II Chapter - Sequences and Series. Sequences Definition. A sequence is a list of numbers written in a definite order, We call a n the general term of the sequence. {a, a

More information

198:538 Complexity of Computation Lecture 16 Rutgers University, Spring March 2007

198:538 Complexity of Computation Lecture 16 Rutgers University, Spring March 2007 198:538 Complexity of Computation Lecture 16 Rutgers University, Spring 2007 8 March 2007 In this lecture we discuss Shamir s theorem that PSPACE is the set of languages that have interactive proofs with

More information

Lecture 15 - NP Completeness 1

Lecture 15 - NP Completeness 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 29, 2018 Lecture 15 - NP Completeness 1 In the last lecture we discussed how to provide

More information

Predicting Protein Interactions with Motifs

Predicting Protein Interactions with Motifs Predicting Protein Interactions with Motifs Jessica Long Chetan Sharma Lekan Wang December 12, 2008 1 Background Proteins are essential to almost all living organisms. They are comprised of a long, tangled

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Complex Matrix Transformations

Complex Matrix Transformations Gama Network Presents: Complex Matrix Transformations By By Scott Johnson Gamasutra May 17, 2002 URL: http://www.gamasutra.com/features/20020510/johnson_01.htm Matrix transforms are a ubiquitous aspect

More information

MITOCW ocw f99-lec05_300k

MITOCW ocw f99-lec05_300k MITOCW ocw-18.06-f99-lec05_300k This is lecture five in linear algebra. And, it will complete this chapter of the book. So the last section of this chapter is two point seven that talks about permutations,

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

A vector from the origin to H, V could be expressed using:

A vector from the origin to H, V could be expressed using: Linear Discriminant Function: the linear discriminant function: g(x) = w t x + ω 0 x is the point, w is the weight vector, and ω 0 is the bias (t is the transpose). Two Category Case: In the two category

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

value of the sum standard units

value of the sum standard units Stat 1001 Winter 1998 Geyer Homework 7 Problem 18.1 20 and 25. Problem 18.2 (a) Average of the box. (1+3+5+7)=4=4. SD of the box. The deviations from the average are,3,,1, 1, 3. The squared deviations

More information

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening

Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening David J. Edwards 1 (with Robert W. Mee 2 and Eric D. Schoen 3 ) 1 Virginia Commonwealth University, Richmond, VA 2 University of

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

CPSC 320 Sample Solution, The Stable Marriage Problem

CPSC 320 Sample Solution, The Stable Marriage Problem CPSC 320 Sample Solution, The Stable Marriage Problem September 10, 2016 This is a sample solution that illustrates how we might solve parts of this worksheet. Your answers may vary greatly from ours and

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

MITOCW watch?v=vu_of9tcjaa

MITOCW watch?v=vu_of9tcjaa MITOCW watch?v=vu_of9tcjaa The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : :

Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : : Go and Genetic Programming Playing Go with Filter Functions S.F. da Silva November 21, 1996 1 Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function

More information

Improved Holt Method for Irregular Time Series

Improved Holt Method for Irregular Time Series WDS'08 Proceedings of Contributed Papers, Part I, 62 67, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Improved Holt Method for Irregular Time Series T. Hanzák Charles University, Faculty of Mathematics and

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

Relative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data

Relative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data Relative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data Part II: Strings of Pearls G n,r with Biased Perturbations Jörg Sameith Graduiertenkolleg

More information

Reading and Writing. Mathematical Proofs. Slides by Arthur van Goetham

Reading and Writing. Mathematical Proofs. Slides by Arthur van Goetham Reading and Writing Mathematical Proofs Slides by Arthur van Goetham What is a proof? Why explanations are not proofs What is a proof? A method for establishing truth What establishes truth depends on

More information

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

On the Structure of Low Autocorrelation Binary Sequences

On the Structure of Low Autocorrelation Binary Sequences On the Structure of Low Autocorrelation Binary Sequences Svein Bjarte Aasestøl University of Bergen, Bergen, Norway December 1, 2005 1 blank 2 Contents 1 Introduction 5 2 Overview 5 3 Denitions 6 3.1 Shift

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Report on article Universal Quantum Simulator by Seth Lloyd, 1996

Report on article Universal Quantum Simulator by Seth Lloyd, 1996 Report on article Universal Quantum Simulator by Seth Lloyd, 1996 Louis Duvivier Contents 1 Context and motivations 1 1.1 Quantum computer.......................... 2 1.2 Quantum simulation.........................

More information

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer

More information

MEASURING COMPATIBILITY (CLOSENESS) IN WEIGHTED ENVIRONMENTS. WHEN CLOSE REALLY MEANS CLOSE?

MEASURING COMPATIBILITY (CLOSENESS) IN WEIGHTED ENVIRONMENTS. WHEN CLOSE REALLY MEANS CLOSE? MEASURING COMPATIBILITY (CLOSENESS) IN WEIGHTED ENVIRONMENTS. WHEN CLOSE REALLY MEANS CLOSE? Claudio Garuti General Manager Fulcrum Engineering Santiago Chile Claudiogaruti@fulcrum.cl Keywords: Compatibility,

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

1 Introduction A priority queue is a data structure that maintains a set of elements and supports operations insert, decrease-key, and extract-min. Pr

1 Introduction A priority queue is a data structure that maintains a set of elements and supports operations insert, decrease-key, and extract-min. Pr Buckets, Heaps, Lists, and Monotone Priority Queues Boris V. Cherkassky Central Econ. and Math. Inst. Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Craig Silverstein y Computer Science Department

More information