TP, TN, FP and FN Tables for dierent methods in dierent parameters:
|
|
- Jody Burns
- 6 years ago
- Views:
Transcription
1 TP, TN, FP and FN Tables for dierent methods in dierent parameters: Zhu,Yunan January 11, 2015 Notes: The mean of TP/TN/FP/FN shows NaN means the original matrix is a zero matrix True P/N the bigger the better False P/N the smaller the better rho=seq(0.01,1, length=100) Dene better as closer to the true graph. The tables below denote the mean of elements in each True Positives/True Negatives/False Positives/False Negatives matrix corresponding to dierent parameter settings. lenboo is Bootstrap times pi is the threshold value for Bootstrap Glasso nset is the sample size in each simulation ndata is the total repeat times (times of simulation) 1
2 nset=100, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= pi= pi= huge+ric huge+stars huge+ebic
3 Comments on the changing results of Bootstrap Glasso: (Since sometimes the BIC T N, F P for Glasso obtained from running the Bootstrap Glasso codes alone (not together with Adaptive Lasso, SCAD and Huge) are dierent from the results running Bootstrap Glasso codes together with Adptive Lasso, SCAD and Huge.) For lenboo = 50, pi = 0.75, 0.8, 0.85, I didn't apply bootstrap Glasso alone. The bootstrap Glasso results directly come from the Glasso Series code (run together with Glasso, Adaptive Glasso and SCAD.) When lenboo = 50, both pi = 0.9 and 0.95, BIC T N and F P for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to , For lenboo = 100, BIC T N and F P under all pi for original GLASSO changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) to ,
4 nset=200, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=50 pi= pi= pi= pi= pi= Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original Glasso BIC T N and F P under pi = 0.75, 0.8, 0.85, 0.9, 0.95 are all unchanged (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 4
5 nset=500, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso: When lenboo = 100, for original GLASSO BIC T N and F P, all pi changes (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ). T N and F P for all the pis change to ,
6 nset=1000, ndata=100 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= pi= pi= pi= Boo Glasso lenboo=200 pi= huge+ric huge+stars huge+ebic Comments on the changing results of Bootstrap Glasso : When lenboo = 100, BIC T N and F P for the Original Glasso under all pi = 0.75, 0.8, 0.85, 0.9, 0.95 don't change (compare applying bootstrap glasso alone to boo glasso with adaptive lasso, scad together ) 6
7 Comments on behaviors: I used 4 criteria to evaluate performances of 9 methods: 4 criteria : 1. (Mean of ) Ture Positives: Êij λ 0 & Eλ ij 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Positives are the larger the better, where a True Positive indicates the frequency of estimating an existing edge correctly. A non-one True Positive indicates the estimated graph omit an existing edge in the true graph. The mean of True Positves equals to one signies the estimated graph contains all the edges in the true graph. The larger the denser and closer to the truth, the smaller the sparser. We can regard the (mean of) True Positives as the frequency of estimating a nonzero element as a nonzero element. 7
8 2. (Mean of ) True Negatives: Êij λ = 0 & Eλ ij = 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So True Negatives are the larger the better, where a True Negative indicates the frequency of estimating a non-existing edge correctly. The mean of True Negatives equal to one signies the estimated graph estimates all the nonexisting edges correctly, or means the estimated graph doesn't add any non-existing edges to the true graph. (not denser than the true graph) A non-one True Negative indicates adding a non-existing edge to the true graph. (denser) Larger mean of True Negatives indicates adding less wrong edges to the true graph. (sparser) The larger the sparser and closer to the truth, the smaller the denser. We can regard the (mean of ) True Negatives as the frequency of estimating an exactly zero element as exactly zero. 8
9 3. (Mean of ) False Positives: Êij λ 0 & Eλ ij = 0, where Êλ ij 0 denotes there is an edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ = 0 denotes there is no edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Positives are the smaller the better, where a False Positive indicates the frequency of estimating a non-existing edge wrongly. A nonzero False Positive indicates adding a non-existing edge to the true graph. Larger False Positives indicate estimating a denser graph than the true graph. (denser) The larger the denser, the smaller the sparser and closer to the truth. We can regrad the (mean of ) False Positives as the frequency of estimating an exactly zero element as nonzero. 4. (Mean of ) False Negatives: Êij λ = 0 & Eλ ij 0, where Êλ ij = 0 denotes there is no edge between nodes i and j in the estimated graph with respect to regularization parameter λ, and Eij λ 0 denotes there is an edge between nodes i and j in the true graph with respect to regularization parameter λ. So False Negatives are the smaller the better, where a False Negative indicates the frequency of omitting an existing edge in the true graph. A nonzero False Negative indicates omitting an existing edge in the true graph. Larger False Negatives indicate estimating a sparser graph than the true graph. (sparser) The larger the sparser, the smaller the denser and closer to the truth. We can regard the (mean of) False Negatives as the frequency of estimating a nonzero element as exactly zero. 9
10 Remark: (the mean of ) True Negatives + (the mean of ) False Positives = 1, (the mean of ) True Positives + (the mean of ) False Negatives = 1, for each given true precision matrix. Inspired by the aforementioned properties of TP/TN/FP/FN, the criterion of a better method should not only be determined by the sparsity of estimated precision matrix we expect, but also the estimation accuracy we want. ( the sparsity of the true precision matrix itself would aect the choice of methods) In other words, one method may give the sparsest estimation but not the closest estimation to the truth. Analogously, a denser estimation may be more similar to the true graph. 10
11 9 methods: 1. Correlation matrix 2. Partial correlation matrix 3. Glasso 4. Glasso with Adaptive Lasso penalty 5. Glasso with SCAD penalty 6. Bootstrap Glasso 7. Huge with ric (rotation information criterion) 8. Huge with stars (stability approach to regularization selection) 9. Huge with ebic (extended Bayesian information criterion) 11
12 1. Computational speed: Glasso, Adaptive Lasso and SCAD are really fast methods even when increasing parameters (such as nset). Bootstrap Glasso is slower than the Glasso series methods due to resampling (aected mainly by lenboo). Incresing bootstrap times will signicantly slow down its computational speed. (Taking both computational speed and estimation improvement into account, I select lenboo = 100 when comparing Bootstrap Glasso to other methods.) ( Large lenboo will lead to very slow computational speed, whereas a too small lenboo may not suciently reect the advantage of Bootstrap Glasso) ( However, using lenboo = 100 here doesn't mean I am sure 100 is the best choice for lenboo.) What's more, I think lenboo should not be greater than nset. (When nset is small and resample more times, in each Bootstrap step, the new Bootstrap data is very likely not containing all the dierent original data, which could leads to a worse result than simply applying Glasso. What's worse, plus a larger lenboo than nset, it is very likely to reestimate many times using very bad resample data ( also the similarity among resample data in dierent time may be high). Consequently, the advantages of Bootstrap Glasso may be outperformed by the disadvantages when lenboo is (much) larger than nset.) Huge is the most slow method (aected mainly by nset). (The 'stars' criterion may be the slowest method due to the requirement of subsampling, which may be similar to the Bootstrap Glasso.) In terms of computation speed, the preference of methods are: Glasso, Adaptive Lasso, SCAD >Bootstrap Glasso > Huge (stars > ric, ebic). 12
13 2. Estimation accuracy (x ndata = 100, set nset = 100, 200, 500, 1000): where ndata is the simulation times, nset is the sample size in each simulation and pi is the threshold value for Bootstrap Glasso. Explanation in both numeric way and graphical way (values and edges) 2.1 The variations of methods' performances with the change of nset (from 100 to 1000) and the dierent behaviours of AIC, BIC and CVerror. 1. For Correlation matrix and, 2. Partial correlation matrix: When nset is increased from 100 to 1000, all the four criteria don't change at all: mean of T P remains at 1, mean of T N at 0, mean of F P at 1 and mean of F N at 0. Additionally, as we all know, we could still get relatively large nonzero estimated elements even if the true element is exactly zero. Overall, they are two rather poor methods to estimate the true precision matrix (true graph), which can be explained as: all the nonzero elements in the true precision matrix are estimated as nonzero. (all the existing edges will always be included in the estimated graph) all the zero elements in the true precision matrix are estimated as nonzero. (all the nonexisting edges will be included in the estimated graph) (far denser than the true graph) all the zero elements in the true precision matrix are estimated as nonzero. (same as above situation) all the nonzero elements in the true precision matrix are estimated as nonzero. (same as the rst situation) 13
14 3. For Glasso: The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero.) The mean of True Negatives is around 0.2 or 0.3 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset=100, BIC, CV > AIC (here AIC is short for the means of the True Negatives corresponding to AIC). When nset = 200, 500 BIC >CV >AIC. When nset = 1000, BIC >AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of True Negatives (the estimations for zero elements are exactly zero) The mean of False Positives is around 0.7 or 0.8 (for AIC, BIC and CVerror), and don't have a signicant monotonic variation pattern along with the change of nset. When nset = 100, BIC, CV < AIC. When nset = 200, 500 BIC <CV <AIC. When nset = 1000, BIC <AIC, CV. Therefore, BIC is the best criterion for Glasso in terms of False Positives (least often to estimate zero elements as nonzero). (This result will be consistent with the conclusion from True Negatives since (the mean of) TN +(the mean of ) FP = 1) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 4. For Glasso with Adaptive Lasso penalty The mean of True Positives is always 1. nonzero.) (The estimations for nonzero elements are always The mean of True Negatives is around 0.5 to 0.8 (for AIC, BIC and CVerror), and monotonically increases as nset increases. When nset = 100, CV >BIC >AIC (at around 0.65, 0.5 level). When nset = 200, BIC >CV >AIC (at around 0.67,0.52 level). When nset = 500, BIC >CV >AIC (at around 0.8,0.7 level.) When nset = 1000, BIC > CV, AIC (at around 0.85 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Positives is around 0.2 to 0.5 (for AIC, BIC and CVerror), and monotonically decreases as nset increases. When nset = 100, CV <BIC <AIC (at around 0.32, 0.5 level). When nset = 200, BIC<CV <AIC (at around 0.33,0.52 level). When nset = 500, BIC <CV <AIC (at around 0.2,0.3 level.) When nset = 1000, BIC < CV, AIC (at around 0.14,0.15 level.) Therefore, CVerror is slightly better when nset is small and BIC is the best when nset is large. The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 14
15 5. For Glasso with SCAD penalty The mean of True Positives is the biggest dierence for SCAD from other methods, in the sense that is is always not 1 for all AIC, BIC and CVerror (The estimation for nonzero elements are not always nonzero, which means this method may lead to a even sparser estimation than the truth, or the number of edges could be less than the true graph). Additionally, the mean of True Positives monotonically increases as nset increases.(at around to level, which never happens to other methods). When nset = 100, BIC, AIC >CV.When nset = 200,CV >AIC, BIC. When nset = 500,CV >AIC >BIC. When nset = 1000, AIC >CV >BIC. Whereas the dierences among AIC, BIC and CVerror are not big. The mean of True Negatives is around 0.7 to 0.97 level, and monotonically increases as nset increases (only one exception is BIC from nset = 500 to 1000, decreasing to ). When nset = 100, BIC >CV >AIC (at around 0.89,0.85 to 0.7 level) When nset = 200, BIC >CV >AIC (at around 0.90,0.86,0.76 level). When nset = 500, BIC >CV >AIC (at around 0.98,0.945,0.938 level.) When nset = 1000, BIC > CV>AIC (at around 0.97,0.96 level.) Therefore, BIC is the best criterion in terms of True Negatives. The mean of False Positives is around 0.29 to 0.02 level, and monotonically decreases as nset increases (one exception is BIC from nset = 500 to 1000, increasing to ). When nset = 100, BIC <CV <AIC (at around 0.105,0.14 to 0.29 level) When nset = 200, BIC <CV <AIC (at around 0.09,0.13,0.24 level). When nset = 500, BIC <CV <AIC (at around 0.018,0.05,0.06 level.) When nset = 1000, BIC < CV<AIC (at around0.02,0.04,0.05 level.) Therefore, BIC is the best criterion in terms of False Positives. The mean of False Negatives for SCAD is also very dierent from others. It is never 0 for dierent nset and for all AIC, BIC and CVerror. (The estimation for nonzero elements are sometimes zeros, which never happens to other methods.) (at around 0.01 to 0.16 level) When nset = 100, AIC = BIC <CV (at around 0.01,0.025 level). When nset = 200, CV <AIC = BIC (at around 0.03,0.035 level). When nset = 500, AIC <BIC <CV (at aroud 0.08,0.085,0.14 level). When nset = 1000, AIC <CV <BIC (at around 0.13,0.14,0.165 level). The dierences for each nset among AIC, BIC and CVerror are not big. But the increasing pattern for the mean of False Negatives is clear (increases with nset). So maybe we could consider AIC as the best criterion based on False Negatives. 15
16 6. For Bootstrap Glasso The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives increases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of True Negatives is not very clear since I only run lenboo = 50,100 for nset = 100, 200 and lenboo = 100 for the rest nset. (Only according to these two settings, the results look like when xing nset and pi, the mean of True Positives doesn't have an obvious variation pattern with lenboo.) For each pi and lenboo, rst the mean of True Negatives decreases with nset when it increase from 100 to 200, then the mean of True Negatives increases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror ). As for the comparision of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC >CV >AIC, whereas when pi = 0.85, 0.9, 0.95, BIC >AIC>CV. But for nset = 1000, for pi = , we have BIC >CV>AIC, for pi = 0.95, we have BIC >AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on True Negatives. The mean of False Positives decreases with pi when x nset and lenboo. For each nset and pi, the impact of lenboo on the mean of False Positives is not shown. For each pi and lenboo, rst the mean of False Positives increases with nset when it increases from 100 to 200, then the mean of False Positives decreases with nset when it increases from 200 to 1000 (for all AIC, BIC and CVerror). As for the comparison of AIC, BIC and CVerror, for nset = 100, 200 and 500, when pi is small (0.75, 0.8), BIC <CV <AIC, whereas when pi = 0.85, 0.9, 0.95, BIC<AIC <CV. But for nset = 1000, for pi = , BIC <CV <AIC, for pi = 0.95, we have BIC <AIC = CV. Therefore, BIC is the best criterion for Bootstrap Glasso based on False Positives. Based on my previous work and the tables shown above, I think the power of improvement for True Negatives/False Positives are pi >nset >lenboo, which has yet taken the enormous time consumed by using a relatively large lenboo into account. (Therefore, I will suggest using a resonably large pi and nset, and leave a not too small lenboo, such as 100.) (But there is still one question remaining to be tested and conrmed : whether lenboo should be objectively large or small itself, or it should be relatively large or small, which means actually the ratio lenboo/nset matters?) The mean of False Negatives is always 0. ( The estimation for nonzero elements are always nonzero.) 16
17 The following Huge results are based on the codes that run Huge alone (not together with Glasso series, or say before Bootstrap Glasso) 7. For Huge with ric (rotation information criterion) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.302, 0.255, , 0.35, for nset = 100, 200, 500, 1000 respecively. ) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. ( , 0.83, , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 8. For Huge with stars (stability approach to regularization selection) The mean of True Positives is always 1. (The estimations for nonzero elements are always nonzero) The mean of True Negatives has a monotonically increasing pattern as nset increases from 100 to ( , 0.29, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Positives has a monotonically decreasing pattern as nset increases from 100 to ( , 0.71, , , for nset = 100, 200, 500, 1000 respectively) The mean of False Negatives is always 0. (The estimations for nonzero elements are never exactly zero) 9. For Huge with ebic (extended Bayesian information criterion) The mean of True Positives is always 1. nonzero) (The estimations for nonzero elements are always The mean of True Negatives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.4, , , , for nset = 100, 200, 500, 1000 respectively.) The mean of False Positives doesn't have an obvious monotonically changing pattern along with the change of nset. (0.6, , , , for nset = 100, 200, 500, 1000 respecively.) The mean of False Negatives is always 0. (The estimations for nonzero elements are always nonzero) 17
18 2.2 Comparison among methods. based on True Negatives/False Positives. > means better ( larger True Negatives/ smaller False Positives ) When nset = 100, Boo Glasso + pi (0.95) + BIC > SCAD + BIC > Boo Glasso + pi(0.95) + AIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi(0.85) + BIC > SCAD + AIC >Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + CV > Adaptive Lasso + BIC = Boo Glasso + pi(0.8)+ BIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Huge + ebic > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.75) + CV = Glasso + CV > Glasso + BIC > Huge + ric > Boo Glasso + pi(0.8) + AIC > Huge+stars > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 200, Boo Glasso + pi (0.95)+ BIC > SCAD + BIC > SCAD + CV > Boo Glasso + pi (0.9) + BIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > SCAD + AIC > Boo Glasso + pi(0.85) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + CV > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.9) + AIC > Adaptive Lasso + AIC > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.75) + BIC > Huge + ebic > Boo Glasso + pi(0.85) + AIC > Glasso + BIC > Huge + stars > Boo Glasso + pi(0.85) + CV >Huge + ric > Glasso + CV = Boo Glasso + pi(0.75) + CV = Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 500, SCAD+BIC > Boo Glasso+ pi (0.95) + BIC > SCAD + CV > SCAD + AIC > Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.95) + CV > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.9) + CV > Adaptive Lasso + CV > Adaptive Lasso + AIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.85) + CV > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric 18
19 > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Glasso + pi(0.75) + CV > Glasso + CV > Boo Glasso + pi(0.75) + AIC > Glasso + AIC > (Partial) Correlation matrix. When nset = 1000, SCAD+BIC > SCAD+CV > SCAD + AIC > Boo Glasso + pi (0.95) + BIC > Boo Glasso + pi (0.95) + CV = Boo Glasso + pi(0.95) + AIC > Boo Glasso + pi (0.9) + BIC > Adaptive Lasso + BIC > Adaptive Lasso + AIC = Adaptive Lasso + CV > Boo Glasso + pi(0.9) + CV > Boo Glasso + pi(0.9) + AIC > Boo Glasso + pi(0.85) + BIC > Boo Glasso + pi(0.8) + BIC > Boo Glasso + pi(0.85) + CV > Boo Glasso + pi(0.85) + AIC > Boo Glasso + pi(0.75) + BIC > Boo Glasso + pi(0.8) + CV > Boo Glasso + pi(0.8) + AIC > Huge + stars = Huge + ebic > Glasso + BIC > Huge + ric > Glasso + pi(0.75) + CV > Boo Glasso + pi(0.75) +AIC > Glasso + CV > Glasso + AIC > (Partial) Correlation matrix. 19
20 Remarks: According to the table above, when nset = 1000, ndata = 100, even if lenboo increases to 200 and set pi = 0.95, SCAD still perfroms better than Bootstrap Glasso. Astonishingly, when nset is small, such as 100, lenboo = 50 performs better than lenboo = 100 or 200. But when nset = 200, lenboo = 200 performs better than lenboo = 50 and 100. Therefore, I think lenboo at the largest should not be greater than nset. BIC gives the best estimation compared to AIC and CVerror most of the time. The choice of BIC could oset method shortcomings and speed inconvenience due to large paramters to a certain degree. So suprisingly, sometimes, using BIC is even a better choice than changing to another better method or changing the value of parameters, which means without large pi, simply by using BIC criterion, we could still get good estimation. It seems when nset is large; pi is large and lenboo is not too small, AIC has the same True Negatives/False Positives as CV. It seems when nset is relatively large (500 or 1000), Huge+stars has the same True Negatives/False Positives as Huge +ebic. For large nset, SCAD is more ecient than large pi Bootstrap Glasso. (even SCAD+CV could be better than Boo Glasso +pi(0.95) +BIC after we thought BIC is the best criterion). SCAD and Adaptive Lasso starts to triumph over Bootstrap Glasso with large pi. Original Glasso +BIC could also outperform Bootstrap Glasso +pi(0.75) +CV, AIC (arms the superiority of BIC over AIC and CV). 20
21 3. Questions and Problems: 1. When running Adaptive Lasso penalty and SCAD penalty codes in R, I always get warning messages like these: 1: In lamhat <= lam : longer object length is not a multiple of shorter object length 2: In a * lam - lamhat : longer object length is not a multiple of shorter object length 3: In lamhat > lam : longer object length is not a multiple of shorter object length 4: In pmax(a * lam - lamhat, 0) * (lamhat > lam)/(a - 1)/lam : longer object length is not a multiple of shorter object length 5: In lam * ((lamhat <= lam) + pmax(a * lam - lamhat, 0) *... : longer object length is not a multiple of shorter object length 2. BIC T N andf P for Bootstrap Glasso are always slightly dierent when conduct Bootstrap Glasso code alone from conducting it together with Glasso, Adaptive Glasso and SCAD. These two ways of calculation will SOMETIMES give very few dierent estimated precision matrices, which will not aect AIC, CV error and BICT P, F N results, but only BIC T N and F P for the Original Glasso (not the Bootstrap Glasso). ( I DON'T KNOW WHY! Are they errors in an allowable range?) Just a slight dierence in the means of T N and F P, on the order of around 0.001(0.004 when nset = 100 and when nset = 500) 3. Even using the same sample data ( xx, equivalent to x j, the simulation time), run Huge + ric several times will give dierent results!! on the order of (I DON'T KNOW WHY!) 4. May due to Problem 2, as long as applying Huge AFTER Bootstrap Glasso, the results for Huge will change ( eg, getting result b). 5. Applying Huge after correlation matrix, partial correlation matrix, Glasso, Adaptive Glasso and SCAD will not change Huge results (eg, getting result a ). 6. Applying Huge before Bootstrap Glasso will not change Huge results either (eg, getting result a). 7. Therefore, I prefer result a, which is obtained by the most of the time. Additionally, I think it is more reliable since it is obtained directly from the original data in the rst run, which means if we set.seed again and get the same data again, without doing any other commands, we will get Huge result a ( and results will start to change if we don't set the same seed again before applying Huge). 21
22 Appendix nset=200, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD Boo Glasso lenboo=100 pi= pi= huge+ric huge+stars huge+ebic
23 when incresing simulation times: nset=100, ndata=200 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic nset=100, ndata=500 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic
24 nset=100, ndata=1000 Ê λ ij 0 Eλ ij 0 Êλ ij = 0 Eλ ij = 0 Êλ ij 0 Eλ ij = 0 Êλ ij = 0 Eλ ij 0 TP bigger better TN bigger better FP smaller better FN smaller better AIC BIC CV AIC BIC CV AIC BIC CV AIC BIC CV correlation partial corr Glasso Adaptive SCAD huge+ric huge+stars huge+ebic
Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS
Estimating Sparse Graphical Models: Insights Through Simulation by Yunan Zhu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in STATISTICS Department of
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationModel selection in penalized Gaussian graphical models
University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst January 2014 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume
More informationOnline Appendix to: Optimizing policymakers' loss functions in crisis prediction: before, within or after?
Online Appendix to: Optimizing policymakers' loss functions in crisis prediction: before, within or after? Peter Sarlin a,b, Gregor von Schweinitz c,d, a Department of Economics at Hanken School of Economics
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationPRIME GENERATING LUCAS SEQUENCES
PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1 PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of
More informationMarginal Functions and Approximation
UCSC AMS/ECON 11A Supplemental Notes # 5 Marginal Functions and Approximation c 2006 Yonatan Katznelson 1. The approximation formula If y = f (x) is a dierentiable function then its derivative, y 0 = f
More informationRate-Monotonic Scheduling with variable. execution time and period. October 31, Abstract
Rate-Monotonic Scheduling with variable execution time and period Oldeld Peter October 31, 1997 Abstract Abstract is something cannot be understood. 1 Rate Monotonic Model Let Ti be a task. Let Pi be the
More informationOrdinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit
Ordinary Least Squares (OLS): Multiple Linear egression (ML) Assessment I What s New? & Goodness-of-Fit Introduction OLS: A Quick Comparison of SL and ML Assessment Not much that's new! ML Goodness-of-Fit:
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More information1/sqrt(B) convergence 1/B convergence B
The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been
More informationClustering of Amino Acids Proles
Clustering of Amino Acids Proles Samuele Zoppi University Of Zurich, Switzerland samuele.zoppi@uzh.ch 1 Introduction Classifying feed data can be very challenging. When a feed sample is analyzed, an amino
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationLearning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract
Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting
More information2 Nils Andersson and Kostas D. Kokkotas Moreover, the w-mode spectra are qualitatively similar for axial and polar perturbations (for a description of
Mon. Not. R. Astron. Soc. 000, 000{000 (1997) Pulsation modes for increasingly relativistic polytropes Nils Andersson 1 and Kostas D. Kokkotas 2 1 Department of Physics, Washington University, St Louis
More informationWhat Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract
What Every Programmer Should Know About Floating-Point Arithmetic Last updated: November 3, 2014 Abstract The article provides simple answers to the common recurring questions of novice programmers about
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationDepartment of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim
Tests for trend in more than one repairable system. Jan Terje Kvaly Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim ABSTRACT: If failure time data from several
More informationEvaluation & Credibility Issues
Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data
More informationAchilles: Now I know how powerful computers are going to become!
A Sigmoid Dialogue By Anders Sandberg Achilles: Now I know how powerful computers are going to become! Tortoise: How? Achilles: I did curve fitting to Moore s law. I know you are going to object that technological
More informationExponential Functions and Graphs - Grade 11 *
OpenStax-CNX module: m30856 1 Exponential Functions and Graphs - Grade 11 * Rory Adams Free High School Science Texts Project Heather Williams This work is produced by OpenStax-CNX and licensed under the
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More information1 Matrices and Systems of Linear Equations
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging
More informationStatistical Inference
Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts
More informationConvergence Complexity of Optimistic Rate Based Flow. Control Algorithms. Computer Science Department, Tel-Aviv University, Israel
Convergence Complexity of Optimistic Rate Based Flow Control Algorithms Yehuda Afek y Yishay Mansour z Zvi Ostfeld x Computer Science Department, Tel-Aviv University, Israel 69978. December 12, 1997 Abstract
More informationComputational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model
Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model
More informationCPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy
CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy August 25, 2017 A group of residents each needs a residency in some hospital. A group of hospitals each need some number (one
More informationThe Growth of Functions. A Practical Introduction with as Little Theory as possible
The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why
More informationMITOCW watch?v=t6tqhnxy5wg
MITOCW watch?v=t6tqhnxy5wg PROFESSOR: So what are we trying to do? We're going to try to write a matter wave. We have a particle with energy e and momentum p. e is equal to h bar omega. So you can get
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationMore Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018
CS 61B More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018 Here is a review of some formulas that you will find useful when doing asymptotic analysis. ˆ N i=1 i = 1 + 2 + 3 + 4 + + N = N(N+1)
More informationLinear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationMath 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv
Math 1270 Honors ODE I Fall, 2008 Class notes # 1 We have learned how to study nonlinear systems x 0 = F (x; y) y 0 = G (x; y) (1) by linearizing around equilibrium points. If (x 0 ; y 0 ) is an equilibrium
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationGroup exponential penalties for bi-level variable selection
for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More informationOf small numbers with big influence The Sum Of Squares
Of small numbers with big influence The Sum Of Squares Dr. Peter Paul Heym Sum Of Squares Often, the small things make the biggest difference in life. Sometimes these things we do not recognise at first
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationε ε
The 8th International Conference on Computer Vision, July, Vancouver, Canada, Vol., pp. 86{9. Motion Segmentation by Subspace Separation and Model Selection Kenichi Kanatani Department of Information Technology,
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example
Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More informationEssentials of Intermediate Algebra
Essentials of Intermediate Algebra BY Tom K. Kim, Ph.D. Peninsula College, WA Randy Anderson, M.S. Peninsula College, WA 9/24/2012 Contents 1 Review 1 2 Rules of Exponents 2 2.1 Multiplying Two Exponentials
More informationAnnouncements. Problem Set 1 out. Checkpoint due Monday, September 30. Remaining problems due Friday, October 4.
Indirect Proofs Announcements Problem Set 1 out. Checkpoint due Monday, September 30. Grade determined by attempt rather than accuracy. It's okay to make mistakes we want you to give it your best effort,
More informationAn algorithm for solving the graph isomorphism problem
An algorithm for solving the graph isomorphism problem By Lucas Allen Contents -introduction -The problem -The algorithm -Complexity -Examples *Example 1 *Example 2 *Example 3 *Example 4 -Conclusion Introduction
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationIE 5531: Engineering Optimization I
IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24
More informationPhysics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10
Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationMITOCW watch?v=wr88_vzfcx4
MITOCW watch?v=wr88_vzfcx4 PROFESSOR: So we're building this story. We had the photoelectric effect. But at this moment, Einstein, in the same year that he was talking about general relativity, he came
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationAn Attempt To Understand Tilling Approach In proving The Littlewood Conjecture
An Attempt To Understand Tilling Approach In proving The Littlewood Conjecture 1 Final Report For Math 899- Dr. Cheung Amira Alkeswani Dr. Cheung I really appreciate your acceptance in adding me to your
More informationOn High-Dimensional Cross-Validation
On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationDiscrete Probability and State Estimation
6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week
More informationChapter 11 - Sequences and Series
Calculus and Analytic Geometry II Chapter - Sequences and Series. Sequences Definition. A sequence is a list of numbers written in a definite order, We call a n the general term of the sequence. {a, a
More information198:538 Complexity of Computation Lecture 16 Rutgers University, Spring March 2007
198:538 Complexity of Computation Lecture 16 Rutgers University, Spring 2007 8 March 2007 In this lecture we discuss Shamir s theorem that PSPACE is the set of languages that have interactive proofs with
More informationLecture 15 - NP Completeness 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 29, 2018 Lecture 15 - NP Completeness 1 In the last lecture we discussed how to provide
More informationPredicting Protein Interactions with Motifs
Predicting Protein Interactions with Motifs Jessica Long Chetan Sharma Lekan Wang December 12, 2008 1 Background Proteins are essential to almost all living organisms. They are comprised of a long, tangled
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationComplex Matrix Transformations
Gama Network Presents: Complex Matrix Transformations By By Scott Johnson Gamasutra May 17, 2002 URL: http://www.gamasutra.com/features/20020510/johnson_01.htm Matrix transforms are a ubiquitous aspect
More informationMITOCW ocw f99-lec05_300k
MITOCW ocw-18.06-f99-lec05_300k This is lecture five in linear algebra. And, it will complete this chapter of the book. So the last section of this chapter is two point seven that talks about permutations,
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationA vector from the origin to H, V could be expressed using:
Linear Discriminant Function: the linear discriminant function: g(x) = w t x + ω 0 x is the point, w is the weight vector, and ω 0 is the bias (t is the transpose). Two Category Case: In the two category
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationvalue of the sum standard units
Stat 1001 Winter 1998 Geyer Homework 7 Problem 18.1 20 and 25. Problem 18.2 (a) Average of the box. (1+3+5+7)=4=4. SD of the box. The deviations from the average are,3,,1, 1, 3. The squared deviations
More informationPart I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz
Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationSelecting an Orthogonal or Nonorthogonal Two-Level Design for Screening
Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening David J. Edwards 1 (with Robert W. Mee 2 and Eric D. Schoen 3 ) 1 Virginia Commonwealth University, Richmond, VA 2 University of
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationCPSC 320 Sample Solution, The Stable Marriage Problem
CPSC 320 Sample Solution, The Stable Marriage Problem September 10, 2016 This is a sample solution that illustrates how we might solve parts of this worksheet. Your answers may vary greatly from ours and
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationMITOCW watch?v=vu_of9tcjaa
MITOCW watch?v=vu_of9tcjaa The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To
More informationContents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : :
Go and Genetic Programming Playing Go with Filter Functions S.F. da Silva November 21, 1996 1 Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function
More informationImproved Holt Method for Irregular Time Series
WDS'08 Proceedings of Contributed Papers, Part I, 62 67, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Improved Holt Method for Irregular Time Series T. Hanzák Charles University, Faculty of Mathematics and
More informationStochastic dominance with imprecise information
Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationRelative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data
Relative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data Part II: Strings of Pearls G n,r with Biased Perturbations Jörg Sameith Graduiertenkolleg
More informationReading and Writing. Mathematical Proofs. Slides by Arthur van Goetham
Reading and Writing Mathematical Proofs Slides by Arthur van Goetham What is a proof? Why explanations are not proofs What is a proof? A method for establishing truth What establishes truth depends on
More informationWe're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).
Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationOn the Structure of Low Autocorrelation Binary Sequences
On the Structure of Low Autocorrelation Binary Sequences Svein Bjarte Aasestøl University of Bergen, Bergen, Norway December 1, 2005 1 blank 2 Contents 1 Introduction 5 2 Overview 5 3 Denitions 6 3.1 Shift
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationReport on article Universal Quantum Simulator by Seth Lloyd, 1996
Report on article Universal Quantum Simulator by Seth Lloyd, 1996 Louis Duvivier Contents 1 Context and motivations 1 1.1 Quantum computer.......................... 2 1.2 Quantum simulation.........................
More informationChapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer
Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer
More informationMEASURING COMPATIBILITY (CLOSENESS) IN WEIGHTED ENVIRONMENTS. WHEN CLOSE REALLY MEANS CLOSE?
MEASURING COMPATIBILITY (CLOSENESS) IN WEIGHTED ENVIRONMENTS. WHEN CLOSE REALLY MEANS CLOSE? Claudio Garuti General Manager Fulcrum Engineering Santiago Chile Claudiogaruti@fulcrum.cl Keywords: Compatibility,
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More information1 Introduction A priority queue is a data structure that maintains a set of elements and supports operations insert, decrease-key, and extract-min. Pr
Buckets, Heaps, Lists, and Monotone Priority Queues Boris V. Cherkassky Central Econ. and Math. Inst. Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Craig Silverstein y Computer Science Department
More information