AI Memo AIM Permutation Tests for Classification

Size: px
Start display at page:

Download "AI Memo AIM Permutation Tests for Classification"

Transcription

1 AI Memo AIM Permutation Tests for Cassification Sayan Mukherjee Whitehead/MIT Center for Genome Research Center for Bioogica and Computationa Learning Massachusetts Institute of Technoogy Cambridge, MA 0239, USA Poina Goand Computer Science and Artificia Inteigence Laboratory Massachusetts Institute of Technoogy Cambridge, MA 0239, USA Dmitry Panchenko Department of Mathematics Massachusetts Institute of Technoogy Cambridge, MA 0245, USA

2 Abstract We introduce and expore an approach to estimating statistica significance of cassification accuracy, which is particuary usefu in scientific appications of machine earning where high dimensionaity of the data and the sma number of training exampes render most standard convergence bounds too oose to yied a meaningfu guarantee of the generaization abiity of the cassifier. Instead, we estimate statistica significance of the observed cassification accuracy, or the ikeihood of observing such accuracy by chance due to spurious correations of the high-dimensiona data patterns with the cass abes in the given training set. We adopt permutation testing, a non-parametric technique previousy deveoped in cassica statistics for hypothesis testing in the generative setting (i.e., comparing two probabiity distributions). We demonstrate the method on rea exampes from neuroimaging studies and DNA microarray anaysis and suggest a theoretica anaysis of the procedure that reates the asymptotic behavior of the test to the existing convergence bounds. Keywords: Cassification, Permutation testing, Statistica significance, Non-parametric tests, Rademacher processes. Acknowedgments The authors woud ike thank Pabo Tamayo, Vadimir Kotchinskii, Ji Mesirov, Todd Goub and Bruce Fisch for usefu discussions. Sayan Mukherjee was supported by a SLOAN/DOE grant. Poina Goand was supported in part by NSF IIS grant and Athinoua A. Martinos Center for Biomedica Imaging coaborative research grant. Dmitry Panchenko was partiay supported by AT&T Buchsbaum grant. The authors woud ike to acknowedge Dr. M. Spiridon and Dr. N. Kanwisher for providing the fmri data, Dr. R. Buckner for providing the cortica thickness data and Dr. D. Greve and Dr. B. Fisch for hep with registration and feature extraction in the MRI experiments discussed in this paper. Dr. Kanwisher woud ike to acknowedge EY 3455 and MH 5950 grants. Dr. Buckner woud ike to acknowedge the assistance of the Washington University ADRC, James S McDonne Foundation, the Azheimer s Association, and NIA grants AG05682 and AG0399. Dr. B. Fisch woud ike to acknowedge NIH R0 RR6594-0A grant. The Human Brain Project/Neuroinformatics research is funded jointy by the NINDS, the NIMH and the NCI (R0-NS3958). Further support was provided by the NCRR (P4-RR4075 and R0-RR3609).

3 . Introduction Many scientific studies invove detection and characterization of predictive patterns in high dimensiona measurements, which can often be reduced to training a binary cassifier or a regression mode. We wi use two exampes of such appications to iustrate the techniques in this paper: medica image studies and gene expression anaysis. Image-based cinica studies of brain disorders attempt to detect neuroanatomica changes induced by diseases, as we as predict deveopment of the disease. The goas of gene expression anaysis incude cassification of the tissue morphoogy and prediction of the treatment outcome from DNA microarray data. In both fieds, training a cassifier to reiaby abe new exampes into the heathy popuation or one of the disease sub-groups can hep to improve screening and eary diagnostics, as we as provide an insight into the nature of the disorder. Both imaging data and DNA microarray measurements are characterized by high dimensionaity of the input space (thousands of features) and sma datasets (tens of independent exampes), typica of many bioogica appications. For statistica earning to be usefu in such scientific appications, it must provide an estimate of the significance of the detected differences, i.e., a guarantee of how we the resuts of earning describe the entire popuation. machine earning theory offers two types of such guarantees, both based on estimating the expected error of the resuting cassifier function. The first approach is to estimate the test error on a hod-out set or by appying a cross-vaidation procedure, such as a jackknife or bootstrap (Efron, 982) which, in conjunction with a variance-based convergence bound, provides a confidence interva (i.e., the interva that with high probabiity contains the true vaue) for the expected error. Sma sampe sizes render this approach ineffective as the variance of the error on a hodout set is often too arge to provide a meaningfu estimate on how cose we are to the true error. Appying variance-based bounds to the cross-vaidation error estimates produces miseading resuts as the cross-vaidation iterations are not independent, causing us to underestimate the variance. An aternative, but equay fruitess, approach is to use bounds on the convergence of the empirica training error to the expected test error. For very high dimensiona data, the training error is aways zero and the bounds are extremey oose. Thus we are often forced to concude that, athough the cassification accuracy ooks promising, we need significanty more data (severa orders of magnitude more than currenty avaiabe) before the standard bounds provide a meaningfu confidence interva for the expected error (Guyon et a., 998). Since coecting data in these appications is often expensive in terms of time and resources, it is desirabe to obtain a quantitative indicator of how robust is the observed cassification accuracy, ong before the asymptotic bounds appy, for the training set sizes in tens to hundreds exampes. This is particuary reevant since the empirica resuts often indicate that efficient earning is possibe with far fewer exampes than predicted by the convergence bounds. In this paper, we demonstrate how a weaker guarantee, that of statistica significance, can sti be provided for cassification resuts on a sma number of exampes. We consider the question on the differences between the two casses, as measured by the cassifier performance on a test set, in the framework of hypothesis testing traditionay used in statistics. Intuitivey, statistica significance is a measure of how ikey we were to obtain the observed test accuracy by chance, ony because the training agorithm identified some pattern in the 2

4 high-dimensiona data that happened to correate with the cass abes as an artifact of a sma data set size. Our goa is to reject the nu hypothesis, namey that a given famiy of cassifiers cannot earn to accuratey predict the abes of a test point given a training set. The empirica estimate of the expected test error can serve as a test statistic that measures how different the two casses are with respect to the famiy of cassifiers we use in training. We adopt permutation tests (Good, 994, Kenda, 945), a non-parametric technique deveoped in the statistics iterature, to estimate the probabiity distribution of the statistic under the nu hypothesis from the avaiabe data. We empoy the permutation procedure to construct an empirica estimate of the error distribution and then use this estimate to assign a significance to the observed cassification error. In the next section, we provide the necessary background on hypothesis testing and permutation tests. In Section 3, we extend the permutation procedure to estimate statistica significance of cassification resuts. Section 4 demonstrates the appication of the test on two detaied exampes, reports resuts for severa studies from the fieds of brain imaging and gene expression anaysis and offers practica guideines for appying the procedure. In Section 5, we suggest a theoretica anaysis of the procedure that eads to convergence bounds governed by simiar quantities to those that contro standard empirica error bounds, cosing with a brief discussion of open questions. 2. Background. Hypothesis Testing and Permutations In two-cass comparison hypothesis testing, the differences between two data distributions are measured using a dataset statistic T :(R n {, }) R, such that for a given dataset {(x k,y k )} k=,wherex k R n are observations and y k {, } are the corresponding cass abes, T (x,y,...,x,y )isameasureofthesimiarityofthe subsets {x k y k =} and {x k y k = }. The nu hypothesis typicay assumes that the two conditiona probabiity distributions are identica, p(x y=) = p(x y= ), or equivaenty, that the data and the abes are independent, p(x,y)= p(x)p(y). The goa of the hypothesis test is to reject the nu hypothesis at a certain eve of significance α which sets the maxima acceptabe probabiity of fase positive (decaring that the casses are different when the nu hypothesis is true). For any vaue of the statistic, the corresponding p-vaue is the highest eve of significance at which the nu hypothesis can sti be rejected. In cassica statistics, the data are often assumed to be one-dimensiona (n =). For exampe, in the two-sampe t-test, the data in the two casses are assumed to be generated by one-dimensiona Gaussian distributions of equa variance. The nu hypothesis is that the distributions have the same mean. The distribution of the t-statistic, the difference between the sampe means normaized by the standard error, under the nu hypothesis is the Student s distribution (Sachs, 984, Student, 908). If the integra of Student s distribution over the vaues higher than the observed t-statistic is smaer than the desired significance eve α, we reject the nu hypothesis in favor of the aternative hypothesis that the means of the two distributions are different. In order to perform hypothesis testing, we need to know the probabiity distribution of the seected statistic under the nu hypothesis. In genera, the distribution for a particuar 3

5 statistic cannot be computed without making strong assumptions on the generative mode of the data. Non-parametric techniques, such as permutation tests, can be of great vaue if the distribution of the data is unknown. Permutation tests were first introduced as a non-parametric aternative to the one-dimensiona t-test and have been used to repace Student s distribution when the normaity of the data distribution coud not be assured. Here, we describe the genera formuation of the test as appied to any statistic T. Suppose we have chosen an appropriate statistic T and the acceptabe significance eve α. Let Π be a set of a permutations of indices,...,, where is the number of independent exampes in the dataset. The permutation test procedure that consists of M iterations is defined as foows: Repeat M times (with index m =,...,M): sampe a permutation π m from a uniform distribution over Π, compute the statistic vaue for this permutation of abes t m = T (x,y π m,...,x,y π m ). Construct an empirica cumuative distribution ˆP (T t) = M M Θ(t t m ), where Θ is a step-function (Θ(x) =,ifx 0; 0 otherwise). Compute the statistic vaue for the actua abes, t 0 = T (x,y,...,x,y )andits corresponding p-vaue p 0 under the empirica distribution ˆP.Ifp 0 α, reject the nu hypothesis. The procedure computes an empirica estimate of the cumuative distribution of the statistic T under the nu hypothesis and uses it for hypothesis testing. Since the nu hypothesis assumes that the two casses are indistinguishabe with respect to the seected statistic, a the training datasets generated through permutations are equay ikey to be observed under the nu hypothesis, yieding the estimates of the statistic for the empirica distribution. An equivaent resut is obtained if we choose to permute the data, rather than the abes. Ideay, we woud ike to use the entire set of permutations Π to construct the empirica distribution ˆP, but it might be not feasibe for computationa reasons. Instead, we resort to samping from Π. It is therefore important to seect the number of samping iterations M to be arge enough to guarantee accurate estimation. One soution is to monitor the rate of change in the estimated distribution and stop when the changes are beow an acceptabe threshod. The precision of the test therefore depends on the number of iterations and the number of distinct abeings of the training set. To better understand the difference between the parametric approach of the t-test and permutation testing, observe that statistica significance does not provide an absoute measure of how robust the observed differences are, but it is rather contingent upon certain assumptions about the data distribution in each cass p(x y) being true. The t-test assumes that the distribution of data in each cass is Gaussian, whie the permutation test assumes that the data distribution is adequatey represented by the sampe data. Neither estimates how we the sampe data describe the genera popuation, which is one of the fundamenta questions in statistica earning theory and is outside the scope of this paper. 4 m=

6 3. Permutations Tests for Cassification Permutation tests can be used to assess statistica significance of the cassifier and its performance using an empirica estimate of the test error as a statistic that measures dissimiarity between two popuations. Depending on the amount of the avaiabe data, the test error can be estimated on a arge hod-out set or using cross-vaidation in every iteration of the permutation procedure. The nu hypothesis assumes that the reationship between the data and the abes cannot be earned reiaby by the famiy of cassifiers used in the training step. The aternative hypothesis is that we can train a cassifier with sma expected error. We use permutations to estimate the empirica cumuative distribution of the cassifier error under the nu hypothesis. For any vaue of the estimated error e, the appropriate p-vaue is ˆP (e) (i.e., the probabiity of observing cassification error ower than e). We can reject the nu hypothesis and decare that the cassifier earned the (probabiistic) reationship between the data and the abes with a risk of being wrong with probabiity of at most ˆP (e). The permutation procedure is equivaent to samping new training sets from a probabiity distribution that can be factored into the origina marginas for the data and the abes, p(x,y)=p(x)p(y) because it eaves the data unchanged and maintains reative frequencies of the abes through permutation, whie destroying any reationship between the data and the abes. The test evauates the ikeihood of test error estimate on the origina dataset reative to the empirica distribution of the error on data sets samped from p(x,y). To underscore the point made in the previous section, the test uses ony the avaiabe data exampes to evauate the compexity of the cassification probem, and is therefore vaid ony to the extent that the avaiabe dataset represents the true distribution p(x,y). Unike standard convergence bounds, such as bounds based on VC-dimension, the empirica probabiity distribution of the cassification error under the nu hypothesis says nothing about how we the estimated error rate wi generaize. Thus permutation tests provide a weaker guarantee than the convergence bounds, but they can sti be usefu in testing if the observed cassification resuts are ikey to be obtained by chance, due to a spurious pattern correated with the abes in the given sma data set. Note that the estimated empirica distribution aso depends on the cassifier famiy and the training agorithm used to construct the cassifier function. It essentiay estimates the expressive power of the cassifier famiy with respect to the training dataset. The variance of the empirica distribution ˆP constructed by the permutation test is a function two quantities: the randomness due to the sma sampe size and the difficuty of the cassification probem. The varaince decreases with more sampes and easier cassification probems. 4. Appication of the Test We use permutation testing in our work to assess the significance of the observed cassification accuracy before we concude that the resuts obtained in the cross-vaidation procedure are robust, or decide that more data are needed before we can trust the detected pattern or trend in the bioogica data. In this section, we first demonstrate the procedure in detai on two different exampes, a study of changes in the cortica thickness due to Azheimer s disease using MRI scans for measurement and a discrimination between two types of eukemia 5

7 Error Jacknife error K=0 K=20 K= Training set size N p vaue Statistica significance K=0 K=20 K= Training set size N p vaue Empirica cdf for N= K=0 K= K= error Figure : Estimated test error (eft) and statistica significance (midde) computed for different training set sizes N and test set sizes K, and empirica error distribution (right) constructed for N = 50 and different test set sizes K in the cortica thickness study. Fied circes on the right graph indicate the cassifier performance on the true abes (K = 0: e =.30, p =.9; K = 20: e =.29, p =.08; K = 40: e =.29, p =.03). based on DNA microarray data. These studies invove very different types of data, but both share sma sampe size and high dimensionaity of the origina input space, rendering the convergence bounds extremey oose. We then report experimenta resuts on more exampes from rea bioogica studies and offer practica guideines on appication of the test. In a experiments reported in this section, we used inear Support Vector Machines (Vapnik, 998) to train a cassifier, and jackknifing (i.e., samping without repacement) for cross-vaidation. The number of cross-vaidation iterations was, 000, and the number of permutation iterations was 0, Detaied Exampes The first exampe compares the thickness of the cortex in 50 patients diagnosed with dementia of the Azheimer type and 50 norma contros of matched age. The cortica sheet was automaticay segmented from each MRI scan, foowed by a registration step that brought the surfaces into correspondence by mapping them onto a unit sphere whie minimizing distortions and then aigning the cortica foding patterns (Fisch et a., 999, Fisch and Dae, 2000). The cortica thickness was densey samped on a mm grid at corresponding ocations for a subjects, resuting in over 300,000 thickness measurements. The measurements in neighboring ocations are highy correated, as both the pattern of thickness and the pattern of its change are smooth over the surface of the cortex, eading us to beieve that earning the differences between the two groups might be possibe with a reasonabe number of exampes. We start by studying the behavior of the estimated error and its statistica significance as a function of training set size and test set size, reported in Figure. Every point in the first two graphs is characterized by a corresponding training set size N and test set size K, drawn from the origina dataset. In permutation testing, the abes of the training data are permuted prior to training. It is not surprising that increasing the number of training exampes improves the robustness of cassification as exhibited by both the accuracy and the significance estimates. By examining the eft graph, we concude that at approximatey 6

8 0.5 Jacknife error 0.5 Statistica significance error p vaue (og scae) Training set size N Training set size N Figure 2: Estimated test error and statistica significance for different training set sizes N for the cortica thickness study. Unike the experiments in Figure, a of the exampes unused in training were used to test the cassifier. The p-vaues are shown on a ogarithmic scae. N = 40, the accuracy of the cassification saturates at 7% (e =.29). In testing for significance, we typicay expect to work in this region of reativey sow change in the estimated test error. Increasing the number of independent exampes on which we test the cassifier in each iteration does not significanty affect the estimated cassification error, but substantiay improves statistica significance of the same error vaue, as to be expected: when we increase the test set size, a cassifier trained on a random abeing of the training data is ess ikey to maintain the same eve of testing accuracy. The right graph in Figure iustrates this point for a particuar training set size of N = 50 (we within the saturation range for the expected error estimates). It shows the empirica distribution ˆP (e) curves for the test set sizes K = 0, 20, 40. The fied circes represent cassification performance on the true abes and the corresponding p-vaues. We note again that the three circes represent virtuay the same accuracy, but substantiay different p-vaues. For this training set size, if we set the significance threshod at α =.05, testing on K = 40 achieves statistica significance (p =.03, i.e., 3% chance of observing better than 7% accuracy of cross-vaidation if the data and the abes in this probem are truy independent). In the experiments described above, most iterations of cross-vaidation and permutation testing did not use a avaiabe exampes (i.e., N + K was ess than the tota number of exampes). These were constructed to iustrate the behavior of the test for the same test set size but varying training set sizes. In practice, one shoud use a avaiabe data for testing, as it wi ony improve the significance. Figure 2 shows the estimated cassification error and the corresponding p-vaues that were estimated using a of the exampes eft out in the training step for testing the cassifier. And whie the error graph ooks very simiar to that in Figure, the behavior of significance estimates is quite different. The p-vaues originay decrease as the training set size increases, but after a certain point, they start growing. Two conficting factors contro p-vaue estimates as the number of training exampes increases: improved accuracy of the cassification, which causes the point of interest to side to the eft and as a resut, down on the empirica cdf curve, and the decreasing number of test exampes, which causes the empirica cdf curve to become more shaow. In our experience, the minimum of the p-vaue often ies in the region of sow change in error vaues. In this 7

9 0.4 Jacknife error 0.5 Statistica significance error Training set size N p vaue (og scae) Training set size N Figure 3: Estimated test error and statistica significance for different training set sizes N for the eukemia morphoogy study. A of the exampes unused in training were used to test the cassifier. The p-vaues are shown on a ogarithmic scae. particuar exampe, the minimum of p-vaue, p =0.02, is achieved at N = 50. It is smaer than the numbers reported earier because in this experiment because we use more test exampes (K = 50) in each iteration of permutation testing. Before proceeding to the summary reporting of the experimenta resuts for other studies, we demonstrate the technique on one more detaied exampe. The objective of the underying experiment was to accuratey discriminate acute myeoid eukemia (AML) from acute ymphobastic eukemia using DNA microarray expression data. The data set contains 48 sampes of AML and 25 sampes of ALL. Expression eves of 7, 29 genes and expressed sequence tags (ESTs) were measured via an oigonuceotide microarray for each sampe (Goub et a., 999, Sonim et a., 2000). Figure 3 shows the resuts for this study. Sma number of avaiabe exampes forces us to work with much smaer range of training and test set sizes, but, in contrast to the previous study, we achieve statistica significance with substantiay fewer data points. The cross-vaidation error reduces rapidy as we increase the number of training exampes, dropping beow 5% at N = 26 training exampes. The p-vaues aso decrease very quicky as we increase the number of training exampes, achieving minimum of.00 at N = 28 training exampes. Simiary to the previous exampe, the most statisticay significant resut ies in the range of reativey sow error change. As both exampes demonstrate, it is not necessariy best to use most of the data for training. Additiona data might not improve the testing accuracy in a substantia way and coud be much more usefu in obtaining more accurate estimates of the error and the p- vaue. In fact, a commony used eave-one-out procedure that ony uses one exampe to test in each cross-vaidation iteration procures extremey noisy estimates. In a the experiments reported in the next section, we use at east 0 exampes to test in each iteration. 4.2 Experimenta Resuts We report experimenta resuts for nine different studies invoving cassification of bioogica data derived from imaging of two different types, as we as microarray measurements. Tabe summarizes the number of features, which determines the input space dimensionaity, the number of exampes in each cass for each study and the resuts of statistica anaysis. The studies are oosey sorted in the ascending strength of the resuts (decreasing error and p-vaues). 8

10 p-vaue <.05 owest p-vaue Experiment features pos neg N/2 e p N/2 e p e min Lymphoma outcome 7, Brain cancer outcome 7, Breast cancer outcome 24, MRI 327, Meduo vs. gioma 7, AML vs. ALL 7, fmri fu 303, fmri reduced 95, Tumor vs. norm Tabe : Summary of the experimenta data and resuts. The first four coumns describe the study: the name of the experiment, the number of features and the number of exampes in each cass. Coumns 5-7 report the number of training exampes from each cass N/2, the jackknife error e and the p-vaue p for the smaest training set size that achieves significance at α =.05. Coumns 8-0 report the resuts for the training set size that yieds the smaest p-vaue. The ast coumn contains the owest error e observed in the experiment. Imaging Data. In addition to the MRI study of cortica thickness described above ( MRI in Tabe ), we incude the resuts of two fmri (functiona MRI) experiments that compare the patterns of brain activations in response to different visua stimui in a singe subject. We present the resuts of comparing activations in response to face images to those induced by house images, as these categories are beieved to have specia representation in the cortex. The feature extraction step was simiar to that of the MRI study, treating the activation signa as the measurement to be measured over the cortica surface. The first experiment ( fmri fu ) used the entire cortica surface for feature extraction, whie the second experiment ( fmri reduced ) considered ony the visuay active region of the cortex. The mask for the visuay active voxes was obtained using a separate visua task. The goa of using the mask was to test if removing irreevant voxes from consideration improves the cassification performance. Microarray Data. In addition to the eukemia morphoogy study ( AML vs. ALL ), we incude the resuts of five other expression datasets where either the morphoogy or the treatment outcome was predicted (Mukherjee et a., 2003). Three studies invoved predicting treatment outcome: surviva of ymphoma patients, surviva of patients with brain cancer and predicting metastisis of breast cancers. Three other studies invoved predicting morphoogica properties of the tissue: meduobastomas (meduo) vs. giobastomas (gio) AML vs. ALL and tumor tissue vs. norma. For each experiment, we anayzed the behavior of the cross-vaidation error and the statistica significance simiary to the detaied exampes presented earier. Tabe summarizes the resuts for three important events: the first time the p-vaue pot crosses.05 threshod (thus achieving statistica significance at that eve), the point of owest p-vaue,. Giobastomas are tumors of gia ces in the brain whie meduobastomas are tumors of neura tissue. 9

11 and the owest cassification error. For the first two events, we report the number of training exampes, the error and the p-vaue. The owest error is typicay achieved for the argest training set size and is shown here mainy for comparison with the other two error vaues reported. We observe that the error vaues corresponding to the owest p-vaues are very cose to the smaest errors reported, impying that the p-vaues bottom out in the region of a reativey sow change in the error estimates. The first three studies in the tabe did not produce statisticay significant resuts. In the first two studies, the rest of the indicators are extremey weak 2. In the third study, predicting whether a breast tumor wi metastisize, the error stabiizes fairy eary (training on 30 exampes eads to 39% error, whie the smaest error observed is 38%, obtained by training on 58 exampes), but the p-vaues are too high. This eads us to beieve that more data coud hep estabish the significance of the resut, simiary to the MRI study. Unfortunatey, the error in these two studies is too high to be usefu in a diagnostic appication. The rest of the studies achieve reativey ow errors and p-vaues. The ast study in the tabe predicting cancerous tissue from norma tissue yieds a highy significant resut (p <0 6 ), with the error staying very stabe for training on 90 exampes to training on 70 exampes. We aso observe that the significance threshod of.05 is probaby too high for these experiments, as the corresponding error vaues are significanty higher than the ones reported for the smaest p-vaue. A more stringent threshod of.0 woud cause most experiments to produce a more reaistic estimates of the cross-vaidation error attainabe on the given data set. 4.3 Summary And Heuristics In this section, we show how permutation testing in conjunction with cross-vaidation can be used to anayze the quaity of cassification resuts on scientific data. Here, we provide a ist of practica essons earned from the empirica studies that we hope wi be usefu to readers appying this methodoogy.. Interpreting the p-vaue. Two factors affect statistica significance: the separation between the casses and the amount of data we have to support it. We can achieve ow p-vaues in a situation where the two casses are very far apart and we have a few data points from each group, or when they are much coser, but we have substantiay more data. P-vaue by itsef does not indicate which of these two situations is true. However, ooking at both the p-vaue and the cassifier accuracy gives us an indication of how easy it is to separate the casses. 2. Size of the hodout set. In our experience, sma test sizes in cross-vaidation and permutation testing eads to noisy estimates of the cassification accuracy and the significance. We typicay imit the training set size to aow at east 0 test exampes in each iteration of the resamping procedures. 3. Size of the training set. Since we are interested in a robust estimates of the test error, one shoud utiize sufficient number of training exampes to be working in the region where the cross-vaidation error does not vary much. Cross-checking this with the 2. In (Mukherjee et a., 2003), a gene seection procedure ed to greater accuracy and smaer p-vaues. 0

12 region of owest p-vaues provides another usefu indication of the acceptabe training set size. In our experiments with artificia data (not shown here), the number of training exampes at which the p-vaues stop decreasing dramaticay remains amost constant as we add more data. This coud mean that acquiring more experimenta data wi not change the optima training set size substantiay, ony ower the resuting p-vaues. 4. Performance on future sampes. As we pointed out in earier sections, the permutation test does not provide a guarantee on how cose the observed cassification error is to the true expected error. Thus we might achieve statistica significance, but sti have an inaccurate estimate of the expected error. This brings us back to the main assumption made by most non-parametric procedures, namey that the avaiabe exampes capture the reevant properties of the underying probabiity distribution with adequate precision. The variance-based convergence bounds and the VC-stye generaization bounds are sti the ony two ways known to us to obtain a distribution-free guarantee on the expected error of the cassifier. The next section reates permutation testing to the generaization bounds and offers a theoretica justification for the procedure. 5. A Theoretica Motivation for the Permutation Procedure The purpose of the permutation test is to show that it is very unikey that a permuted dataset wi achieve the same cross-vaidation error as the cross-vaidation error on the unpermuted dataset. In this paper, we restrict our proofs to eave-one-out cross-vaidation, with future pans to extend this work to the genera case. Ideay, we woud ike to show that for a reasonaby constrained famiy of cassifiers (for exampe a VC cass) the foowing two facts hod: the eave-one-out error of the cassifier is cose to the error rate of one of the best cassifiers in the cass and the eave-one-out error of the permuted dataset is cose to the smaer of the prior probabiities of the two casses. In Section 5., we prove that the training error on the permuted data concentrates around the smaer prior probabiity. In Section 5.2, we combine previousy known resuts to state that the eave-one-out error of the cassifier is cose to the error rate of one the best cassifiers in the cass. In Section 5.3, we extend a resut from (Mukherjee et a., 2002) about the eave-one-out procedure to reate the training error of permuted data to the eave-one-out error. We cose with some comments reating the theoretica resuts to the empirica permutation procedure. 5. The Permutation Probem We are given a cass of concepts C and an unknown target concept c 0. Without oss of generaity we wi assume that P (c 0 ) /2 and C. For a permutation π of the training data, the smaest training error on the permuted set is e (π) = min P (c c 0 ) () c C = min I(x i c, x π i c 0 )+I(x i c, x π i c 0 ), c C

13 where x i is the i-th sampe and x π i is the i-th sampe after permutation. For a fixed concept c C the average error is EP (c c 0 )=P (c)( P (c 0 )) + ( P (c))p (c 0 ), and it is cear that since P (c 0 ) /2 takingc = wi minimize the average error which in that case wi be equa to P (c 0 ). Thus, our goa wi be to show that under some compexity assumptions on cass C the smaest training error e (π) iscosetop (c 0 ). Minimizing () is equivaent to the foowing maximization probem since max c C I(x i c)(2i(x π i c 0 ) ), e (π) =P (x c 0 ) max c C I(x i c)(2i(x π i c 0 ) ), and P (x c 0 ) is the empirica measure of the random concept. We woud ike to show that e (π) is cose to the random error P (x c 0 ) and give rates of convergence. We wi do this by bounding the process G (π) =sup c C I(x i c)(2i(x π i c 0 ) ) and using the fact that, by Chernoff s inequaity, P (x c 0 )iscosetop(x c 0 ): ( ) 2P (c0 )( P (c 0 ))t IP P (x c 0 ) P (x c 0 ) e t. (2) Theorem If the concept cass C has VC dimension V then with probabiity Ke t/k ( ) V og V og Kt G (π) K min, ( 2P (c 0 )) 2 +. Remark The second quantity in the above bound comes from the appication of Chernoff s inequaity simiar to (2) and, thus, has a one dimensiona nature in a sense that it doesn t depend on the compexity (VC dimension) of cass C. An interesting property of this resut is that if P (c 0 ) < /2 then first term that depends on the VC dimension V wi be of order V og / which, ignoring the one dimensiona terms, gives the zero-error type rate of convergence of e (π) top (x c 0 ). Combining this theorem and equation (2) we can state that with probabiity Ke t/k ( ) V og V og Kt P (x c 0 ) P (x c 0 )+K min, ( 2P (c 0 )) 2 +. In order to prove Theorem, we require severa preiminary resuts. We first prove the foowing usefu emma. 2

14 Lemma It is possibe to construct on the same probabiity space two i.i.d Bernoui sequences ε =(ε,...,ε n ) and ε =(ε,...,ε n ) such that ε is independent of ε ε n and n ε i ε i = n ε i n ε i. Proof For k =0,...,n, et us consider the foowing probabiity space E k. Each eement w of E k consists of two coordinates w =(ε, π). The first coordinate ε =(ε,...,ε n )hasthe margina distribution of an i.i.d. Bernoui sequence. The second coordinate π impements the foowing randomization. Given the first coordinate ε, consider a set I(ε) ={i : ε i =} and denote its cardinaity m =card{i(ε)}. Ifm k, thenπ picks a subset I(π, ε) ofi(ε) with cardinaity k uniformy, and if m<k,thenπ picks a subset I(π, ε) of the compement I c (ε) with cardinaity n k aso uniformy. On this probabiity space E k,weconstructa sequence ε = ε (ε, π) in the foowing way. If k m =card{i(ε)} then we set ε i = if i I(π, ε) andε i = otherwise. If k > m =card{i(ε)} then we set ε i = if i I(π, ε) andε i = otherwise. Next, we consider a space E = k ne k with probabiity measure P(A) = n k=0 B(n, p, k)p(a E k), where B(n, p, k) = ( n k) p k ( p) n k. On this probabiity space the sequence ε and ε wi satisfy the conditions of the emma. First of a, X = ε ε n has binomia distribution since by construction P(X = k) =P(E k)= B(n, p, k). Aso, by construction, the distribution of ε is invariant under the permutation of coordinates. This, ceary, impies that ε is i.i.d. Bernoui. Aso, obviousy, ε is independent of ε ε n. Finay, by construction n ε i ε i = n ε i n ε i. Definition Let u>0 and et C be a set of concepts. Every finite set of concepts c,..., c n with the property that for a c C there is a c j such that c j (x i ) c(x i ) 2 u is caed a u-cover with respect to L2 (x ). The covering number N (C,u,{x,..x }) is the smaest number for which the above hods. Definition 2 The uniform metric entropy is og N (C,u) where N (C,u) is the smaest integer for which, (x,..., x ), N (C,u,{x,..x }) N(C,u). Theorem 2 The foowing hods with probabiity greater than 4e t/4 G (π) sup K µr r 0 2tP (c0 )( P (c 0 )) +2, µ r og N (u, C)du 2 ( 2P (c µr (t +2og(r +)) 0)) + where µ r =2 r and og N (C,u) is the uniform metric entropy for the cass C. 3

15 Proof The process can be rewritten as G (π) =sup c C I(x i c)(2i(x π i c 0 ) ). G (π) =sup c C I(x i c)ε i, where ε i =2I(x π i c 0 ) =± are Bernoui random variabes with P (ε i =)=P (c 0 ). Due to permutations the random variabes (ε i ) depend on (x i ) ony through the cardinaity of {x i c 0 }. By emma we can construct a random Bernoui sequence (ε i )thatis independent of x and for which G (π) sup c C I(x i c)ε i + ε i ε i. We first contro the second term ε i ε i ε i (2P (c 0 ) ) + ε i (2P (c 0 ) ), then using Chernoff s inequaity twice we get with probabiity 2e t ε i ε i 2tP 2 (c0 )( P (c 0 )). We bock concepts in C into eves { C r = c C: } I(x i c) (2 r, 2 r and denote µ r =2 r. We define the processes R(r) =sup I(x i c)ε i, c C r and obtain sup c C I(x i c)ε i sup R(r). r By Taagrand s inequaity on the cube (Taagrand, 995), we have for each eve r ( ) IP ε R(r) E ε sup I(x i c)ε µr t i + e t/4. c C r 4

16 Note that for this inequaity to hod, the random variabes (ε ) need ony be independent, they do not need to be symmetric. This bound is conditioned on a given {x i } and therefore, ( ) IP R(r) E ε sup I(x i c)ε µr t i + e t/4. c C r If, for each r, wesett t +2og(r + ), we can write ( IP r R(r) E ε sup I(x i c)ε µr (t +2og(r +)) i + c C r (r +) 2 e t/4 2e t/4. r=0 Using standard symmetrization techniques we add and subtract an independent sequence ε i such that Eε i = Eε i =(2P (c 0) ): E ε sup I(x i c)ε i c C r E ε sup I(x i c)ε i I(x i c)eε i + I(x i c)(2p (c 0 ) ) c C r ( ) E ε ε sup I(x i c)(ε ε ) ( 2P (c 0 )) inf I(x i c) c C r c C r 2E ηi sup I(x i c)η i µ r( 2P (c 0 )), c C r 2 where η i =(ε i ε i )/2 takes vaues {, 0, } with probabiity P (η i =)=P (η i = ). One can easiy check that the random variabes η i satisfy the inequaity ( ) IP η i a i >t e t 2 2 a 2 i which is the ony prerequisite for the chaining method. Thus, one can write Dudey s entropy integra bound (van der Vaart and Wener, 996) as E ηi sup I(x i c)η i K µr og N (u, C)du. c C r We finay get ( IP r R(r) K µr µr (t +2og(r +)) og N (u, C)du + µ ) r( 2P (c 0 )) 0 2 2e t/ )

17 This competes the proof of Theorem 2. Proof of Theorem ForacasswithVCdimensionV, it is we known that (van der Vaart and Wener, 996) µr 0 Vµ r og 2 µ og N (u, C)du K r. Since without oss of generaity we ony need to consider µ r > /, it remains to appy Theorem 2 and notice that ( ) Vµr og sup K µ r V og r 2 ( 2P (c V og 0)) K min, ( 2P (c 0 )) 2. A other terms that do not depend on the VC dimension V can be combined to give Kt/. 5.2 Convergence of the Leave-one-out Error to the Best in the Cass Now we turn our attention to the cross-vaidation error and show that the eave-one-out error of the cassifier is cose to the error rate of one the best cassifiers in the cass. Given a training set generated by an unknown concept c 0 and a cass of concepts C, the training step seects concept ĉ from this cass via empirica risk minimization ĉ arg min c C P (c c 0 ), where P (c c 0 ) is the empirica symmetric difference observed on the points in the dataset S = {(x i,y i )}. This error is caed the training error. The best cassifiers in the cass are ones for which c opt arg min P (c c 0). c C It is we known that if C has VC dimension V then with probabiity e t (Vapnik, 998) V og P (ĉ c 0 ) P (c opt c 0 ) K, (3) so the expected error of the empirica minimizer approaches the error rate of optima cassifiers in the cass. We now reate the the eave-one-out error to the expected error of the empirica minimizer. The cassifier ĉ S i is the empirica minimizer of the dataset with the i-th point eft out and we write the the eave-one-out error as I(x i ĉ S i). 6

18 Theorem 4.2 in (Kearns and Ron, 999) states that for empirica risk minimization on a VC cass with probabiity δ / V og I(x i ĉ S i) P (ĉ c 0 ) K δ. The above bound couped with the bound in equation (3) ensures that with high probabiity the eave-one-out error approaches the error rate of one of the best cassifiers in the cass. 5.3 Reating the Training Error to the Leave-one-out Error In the two previous subsections we demonstrated that the eave-one-out error approaches the error rate of one of the best cassifiers in the cass and that the training error on the permuted data concentrates around the smaer prior probabiity. However, in our statistica test we use the cross-vaidation error of the permuted data rather than the training error of the permuted data to buid the empirica distribution function. So we have to reate the eave-one-out procedure to the training error for permuted data. In the case of empirica minimization on VC casses, the training error can be reated to the eave-one-out error by the foowing bound (Kearns and Ron, 999, Mukherjee et a., 2002) (see aso appendix A) ( ) V og E S I(x i ĉ S i) P (ĉ c 0 ) Θ, where E S is the expectation over datasets S, V is the VC dimension, and ĉ S i is the empirica minimizer of the dataset with the i-th point eft out. This bound is of the same order as the deviation between the empirica and expected errors of the empirica minimizer ( ) V og E S P (ĉ c 0 ) P (ĉ c 0 ) Θ. This inequaity impies that, on average, the eave-one-out error is not a significanty better estimate of the test error than the training error. In the case of the permutation procedure we show that a simiar resut hods. Theorem 3 If the famiy of cassifiers has VC-dimension V then V og E S,π I(x π i ĉ S i,π) P (x c 0 ) K, where E S,π is the expectation over datasets S and permutations π, ĉ S i,π is the empirica minimizer of the permuted dataset with the i-th point eft out, x π i is the i-th point after the permutation. The proof of this theorem is in appendix A. From Section 5. we have that V og E S,π P (ĉ c 0 ) P (x c 0 ) K. Therefore, one can concude that for the permutation procedure on average the eave-oneout error is not a significanty better estimate of the random error than the training error. 7

19 5.4 Comments on the Theoretica Resuts The theoretica resuts in this section were to give an anaysis and motivation for the permutation tests. The bounds derived are not meant to repace the empirica permutation procedure. Simiary to VC-stye generaization bounds, these woud require amounts of data far beyond the range of sampes reaisticay attainabe in the rea experiments to obtain practicay usefu deviations (of the order of a few percent), which is precisey the motivation for the empirica permutation procedure. Moreover, we state that eave-one-out procedure is not a significanty better estimate of error that the training error. This statement must be taken with a grain of sat since it was derived via a worst case anaysis. In practice, the eave-one-out estimator is amost aways a better estimate of the test error than the training error and, for the empirica risk minimization, the eave-one-out error is never smaer than the training error: (Mukherjee et a., 2002) I(x i ĉ S i) I(x i ĉ S ). Aso, it is important to note that the reation we state between the eave-one-out error and training error is in expectation. Ideay, we woud ike to reate these quantities in probabiity with a simiar rates. It is an open question whether this hods. 6. Concusion And Open Probems This paper describes and expores an approach to estimating statistica significance of a cassifier given a sma sampe size based on permutation testing. The foowing is the ist of open probems reated to this methodoogy:. Size of the training/test set. We provide a heuristic to seect the size of the training and the hodout sets. A more rigorous formuation of this probem might suggest a more principed methodoogy for setting the training set size. This probem is ceary an exampe of the ubiquitous bias-variance tradeoff diemma. 2. Leave-one-out error and training error. In the theoretica motivation, we reate the eave-one-out error to the training error in expectation. The theoretica motivation woud be much stronger if this reation was made in probabiity. A carefu experimenta and theoretica anaysis of the reation between the training error and eave-n-out error for these types permutation procedures woud be of interest. 3. Feature seection. Both in neuroimaging studies and in DNA microarray anaysis, finding the features which most accuratey cassify the data is very important. Permutation procedures simiar to the one described in this paper have been used to address this probem (Goub et a., 999, Sonim et a., 2000, Nichos and Homes, 200). The anaysis of permutation procedures for seecting discriminative features seems to be more difficut than the anaysis of the permutation procedure for cassification. It woud be very interesting to extend the type of anaysis here to the feature seection probem. 8

20 4. Muti-cass cassification. Extending the methodoogy and theoretica motivation for the muti-cass probem has not been done. To concude, we hope other researchers in the community wi find the technique usefu in assessing statistica significance of observed resuts when the data are high dimensiona and are not necessariy generated by a known distribution. Appendix A. Proof of Theorem 3 In this appendix, we prove Theorem 3 from section 5.3. We first define some terms. A concept wi be designated c 0.AdatasetSismade up of points {z i } where z =(x,y). When the i-th point is eft out of S we have the set S i. The empirica minimizer on S is ĉ S and the empirica minimizer on S i is ĉ S i. The empirica error on a set S is P (ĉ S c 0 ), and the empirica error on a set S i is P (ĉ S i c 0 ). If we perform empirica risk minimization on a VC cass, then with high probabiity ( e t/k ) V og P (ĉ S c 0 ) P (ĉ S c 0 ) K. (4) Simiary, with high probabiity V og( ) P (ĉ S i c 0 ) P (ĉ S i c 0 ) K ( ) and P (ĉ S i c 0 ) P (ĉ S i c 0 ) K. We turn the probabiity into an expectation V og E S P (ĉ S i c 0 ) P (ĉ S c 0 ) K. (5) One can check that the expectation over S of the eave-one-out error is equa to the expectation over S of the expected error of ĉ S i (Mukherjee et a., 2002, Bousquet and Eisseeff, 2002): E S I(x i ĉ S i) = E S P (ĉ S i c 0 ). Combining the above equaity with inequaities (4) and (5) gives us the foowing bounds V og E S I(x i ĉ S i) P (ĉ S c 0 ) K, V og E S I(x i ĉ S i) P (ĉ S c 0 ) K. 9

21 So for the nonpermuted data, the eave-one-out error is not significanty coser to the test error in expectation than the training error. We want to show something ike the above for the case where the concept c 0 is random and the empirica minimizer is constructed on a dataset with abes randomy permuted. Let s denote ĉ S i,π the empirica minimizer of the permuted dataset with the ith point eft out. It s eave-one-out error is I(x π i ĉ S i,π). If we can show that E S,π I(x π i ĉ S i,π) = E S,π P (ĉ S i,π c 0 ), then we can use the same argument we used for the nonrandom case. We start by breaking up the expectations E S,π I(x π i ĉ S i,π) = E π E S E zi I(x π i ĉ S i,π) = E π E S E xi E yi I(ĉ S i,π(x π i )=y i), The second ine hods because for a random concept p(x,y)= p(y)p(x). One can easiy check the foowing hod and Simiary it hods that E yi I(ĉ S i,π(x π i )=y i )=min(p (y = ),P(y =))=P (x c 0 ), E π E S E xi min(p (y = ),P(y =))=P (x c 0 ). E S,π P (ĉ S i,π c 0 )=min(p (y = ),P(y =))=P (x c 0 ). References O. Bousquet and A. Eisseeff. Stabiity and generaization. Journa Machine Learning Research, 2: , B. Efron. The Jackknife, The Bootstrap, and Other Resamping Pans. SIAM, Phiadephia, PA, 982. B. Fisch and A.M. Dae. Measuring the thickness of the human cerebra cortex from magnetic resonance images. PNAS, 26: , B. Fisch, M.I. Sereno, R.B.H. Toote, and A.M. Dae. High-resoution intersubject averaging and a coordinate system for the cortica surface. Human Brain Mapping, 8: ,

22 T.R. Goub, D. Sonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coer, M.L. Loh, J.R. Downing, M.A. Caigiuri, C. D. Boomfied, and E. S. Lander. Moecuar Cassification of Cancer: Cass discovery and cass prediction by gene expression monitoring. Science, 286:53 537, 999. P. Good. Permutation Tests: A Practica Guide to Resamping Methods for Testing Hypothesis. Springer-Verag, 994. I. Guyon, J. Makhou, R. Schwartz, and V. Vapnik. What size test set gives good error estimates? IEEE Transactions on Pattern Anaysis and Machine Inteigence, 20:52 64, 998. M. Kearns and D. Ron. Agorithmic stabiity and sanity-check bounds for eave-one-out cross-vaidation. Neura Computation, : , 999. M.G. Kenda. The treatment of ties in ranking probems. Biometrika, 33:239 25, 945. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Statistica earning: Stabiity is necessary and sufficient for consistency of empirica risk minimization. AI Memo , Massachusetts Institute of Technoogy, S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Enge, C. Campbe, T.R. Goub, and J.P. Mesirov. Estimating dataset size requirements for cassifying dna microarray data. Journa Computationa Bioogy, 0(2):9 42, T.E. Nichos and A.P. Homes. Nonparametric permutation tests for functiona neuroimaging: A primer with exampes. Human Brain Mapping, 5: 25, 200. L. Sachs. Appied Statistics: A Handbook of Techniques. Springer Verag, 984. D. Sonim, P. Tamayo, J.P. Mesirov, T.R. Goub, and E. Lander. Cass prediction and discovery using gene expression data. In Proceedings of the Fourth Annua Conference on Computationa Moecuar Bioogy (RECOMB), pages , Student. The probabe error of a mean. Biometrika, 6: 25, 908. M. Taagrand. Concentration of measure and isoperimetric inequaities in product spaces. Pubications Mathématiques de I.H.E.S., 8:73 205, 995. A. van der Vaart and J. Wener. Weak convergence and Empirica Processes With Appications to Statistics. Springer-Verag, 996. V.N. Vapnik. Statistica Learning Theory. John Wiey & Sons,

Permutation Tests for Classification

Permutation Tests for Classification Journa of Machine Learning Research (2000) -48 Submitted 4/00; Pubished 0/00 Permutation Tests for Cassification Poina Goand Computer Science and Artificia Inteigence Laboratory Massachusetts Institute

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Estimating Dataset Size Requirements for. Classifying DNA Microarray Data

Estimating Dataset Size Requirements for. Classifying DNA Microarray Data Estimating Dataset Size Requirements for Cassifying DNA Microarray Data Sayan Mukherjee* +#, Pabo Tamayo +, Simon Rogers o, Ryan Rifkin +#, Anna Enge #, Coin Campbe o, Todd R. Goub + and Ji P. Mesirov

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casea Biometrics Unit, Corne University, Ithaca, N.Y. BU-732-Mf March 98 Abstract Most of the research concerning ridge regression methods has deat with

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING 8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage Magnetic induction TEP Reated Topics Maxwe s equations, eectrica eddy fied, magnetic fied of cois, coi, magnetic fux, induced votage Principe A magnetic fied of variabe frequency and varying strength is

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design Contro Chart For Monitoring Nonparametric Profies With Arbitrary Design Peihua Qiu 1 and Changiang Zou 2 1 Schoo of Statistics, University of Minnesota, USA 2 LPMC and Department of Statistics, Nankai

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

Two-sample inference for normal mean vectors based on monotone missing data

Two-sample inference for normal mean vectors based on monotone missing data Journa of Mutivariate Anaysis 97 (006 6 76 wwweseviercom/ocate/jmva Two-sampe inference for norma mean vectors based on monotone missing data Jianqi Yu a, K Krishnamoorthy a,, Maruthy K Pannaa b a Department

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

Rate-Distortion Theory of Finite Point Processes

Rate-Distortion Theory of Finite Point Processes Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

Statistical Inference, Econometric Analysis and Matrix Algebra

Statistical Inference, Econometric Analysis and Matrix Algebra Statistica Inference, Econometric Anaysis and Matrix Agebra Bernhard Schipp Water Krämer Editors Statistica Inference, Econometric Anaysis and Matrix Agebra Festschrift in Honour of Götz Trenker Physica-Verag

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn Automobie Prices in Market Equiibrium Berry, Pakes and Levinsohn Empirica Anaysis of demand and suppy in a differentiated products market: equiibrium in the U.S. automobie market. Oigopoistic Differentiated

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com

More information

The influence of temperature of photovoltaic modules on performance of solar power plant

The influence of temperature of photovoltaic modules on performance of solar power plant IOSR Journa of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vo. 05, Issue 04 (Apri. 2015), V1 PP 09-15 www.iosrjen.org The infuence of temperature of photovotaic modues on performance

More information

Approach to Identifying Raindrop Vibration Signal Detected by Optical Fiber

Approach to Identifying Raindrop Vibration Signal Detected by Optical Fiber Sensors & Transducers, o. 6, Issue, December 3, pp. 85-9 Sensors & Transducers 3 by IFSA http://www.sensorsporta.com Approach to Identifying Raindrop ibration Signa Detected by Optica Fiber ongquan QU,

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

International Journal of Mass Spectrometry

International Journal of Mass Spectrometry Internationa Journa of Mass Spectrometry 280 (2009) 179 183 Contents ists avaiabe at ScienceDirect Internationa Journa of Mass Spectrometry journa homepage: www.esevier.com/ocate/ijms Stark mixing by ion-rydberg

More information

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.)

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.) October 9, 00 IE 6 Exam Prof. Vardeman. The viscosity of paint is measured with a "viscometer" in units of "Krebs." First, a standard iquid of "known" viscosity *# Krebs is tested with a company viscometer

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

c 2016 Georgios Rovatsos

c 2016 Georgios Rovatsos c 2016 Georgios Rovatsos QUICKEST CHANGE DETECTION WITH APPLICATIONS TO LINE OUTAGE DETECTION BY GEORGIOS ROVATSOS THESIS Submitted in partia fufiment of the requirements for the degree of Master of Science

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

On the evaluation of saving-consumption plans

On the evaluation of saving-consumption plans On the evauation of saving-consumption pans Steven Vanduffe Jan Dhaene Marc Goovaerts Juy 13, 2004 Abstract Knowedge of the distribution function of the stochasticay compounded vaue of a series of future

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

arxiv: v1 [math.ca] 6 Mar 2017

arxiv: v1 [math.ca] 6 Mar 2017 Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear

More information

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS J App Prob 40, 226 241 (2003) Printed in Israe Appied Probabiity Trust 2003 A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS SUNDER SETHURAMAN, Iowa State University Abstract Let X 1,X 2,,X n be a sequence

More information

Traffic data collection

Traffic data collection Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road

More information