Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Size: px
Start display at page:

Download "Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes"

Transcription

1 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes Hyunjik Ki and Yee Whye Teh University of Oxford, DeepMind Abstract Autoating statistical odelling is a challenging proble in artificial intelligence. The Autoatic Statistician takes a first step in this direction, by eploying a kernel search algorith with Gaussian Processes (GP) to provide interpretable statistical odels for regression probles. However this does not scale due to its O(N 3 ) running tie for the odel selection. We propose Scalable Kernel Coposition (SKC), a scalable kernel search algorith that extends the Autoatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP arginal likelihood that sandwiches the arginal likelihood with the variational lower bound. We show that the upper bound is significantly tighter than the lower bound and thus useful for odel selection. 1 Introduction Autoated statistical odelling is an area of research in its early stages, yet it is becoing an increasingly iportant proble [14]. As a growing nuber of disciplines use statistical analyses and odels to help achieve their goals, the deand for statisticians, achine learning researchers and data scientists is at an all tie high. Autoated systes for statistical odelling serves to assist such huan resources, if not as a best alternative where there is a shortage. An exaple of a fruitful attept at autoated statistical odelling in nonparaetric regression is Copositional Kernel Search (CKS) [1], an algorith that fits a Gaussian Process (GP) to the data and autoatically Proceedings of the 21 st International Conference on Artificial Intelligence and Statistics (AISTATS) 218, Lanzarote, Spain. PMLR: Volue 84. Copyright 218 by the author(s). chooses a suitable paraetric for of the kernel. This leads to high predictive perforance that atches kernels hand-selected by GP experts [27]. There also exist other approaches that tackle this odel selection proble by using a ore flexible kernel [2, 24, 3, 41, 43]. However the distinctive feature of Duvenaud et al [1] is that the resulting odels are interpretable; the kernels are constructed in such a way that we can use the to describe patterns in the data, and thus can be used for autoated exploratory data analysis. Lloyd et al [19] exploit this to generate natural language analyses fro these kernels, a procedure that they nae Autoatic Bayesian Covariance Discovery (ABCD). The Autoatic Statistician 1 ipleents this to output a 1-15 page report when given data input. However, a liitation of ABCD is that it does not scale; due to the O(N 3 ) tie for inference in GPs, the analysis is constrained to sall data sets, specialising in one diensional tie series data. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially ore inforation in bigger data, iplying a greater need for ore expressive odels that can discover finer structure. This paper proposes Scalable Kernel Coposition (SKC), a scalable extension of CKS, to push the boundaries of autoated interpretable statistical odelling to bigger data. In suary, our work akes the following contributions: We propose the first scalable version of the Autoatic Statistician that scales up to ediu-sized data sets by reducing algorithic coplexity fro O(N 3 ) to O(N 2 ) and enhancing parallelisability. We derive a novel cheap upper bound on the GP arginal likelihood, that is used in SKC with the variational lower bound [37] to sandwich the GP arginal likelihood. We show that our upper bound is significantly 1 See for exaple analyses.

2 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes tighter than the lower bound, and plays an iportant role for odel selection. 2 ABCD and CKS The Copositional Kernel Search (CKS) algorith [1] builds on the idea that the su and product of two positive definite kernels are also positive definite. Starting off with a set B of base kernels defined on R R, the algorith searches through the space of zero-ean GPs with kernels that can be expressed in ters of sus and products of these base kernels. B = {SE,LIN,PER} is used, which correspond to the squared exponential, linear and periodic kernel respectively (see Appendix C for the for of these base kernels). Thus candidate kernels for an open-ended space of GP odels, allowing for an expressive odel. Such approaches for structure discovery have also appeared in [12, 13]. A greedy search is eployed to explore this space, with each kernel scored by the Bayesian Inforation Criterion () [31] 2 after optiising the kernel hyperparaeters by type II axiu likelihood (ML-II). See Appendix B for the algorith in detail. The resulting kernel can be siplified to be expressed as a su of products of base kernels, which has the notable benefit of interpretability. In particular, note f 1 GP (, k 1 ), f 2 GP (, k 2 ) f 1 + f 2 GP (, k 1 + k 2 ) for independent f 1 and f 2. So a GP whose kernel is a su of products of kernels can be interpreted as sus of GPs each with structure given by the product of kernels. Now each base kernel in a product odifies the odel in a consistent way. For exaple, ultiplication by SE converts global structure into local structure since SE(x, x ) decreases exponentially with x x, and ultiplication by PER is equivalent to ultiplication of the odeled function by a periodic function (see Lloyd et al [19] for detailed interpretations of different cobinations). This observation is used in Autoatic Bayesian Covariance Discovery (ABCD) [19], giving a natural language description of the resulting function odeled by the coposite kernel. In suary ABCD consists of two algoriths: the copositional kernel search CKS, and the natural language translation of the kernel into a piece of exploratory data analysis. 3 Scaling up ABCD ABCD provides a fraework for a natural extension to big data settings, in that we only need to be able to scale up CKS, then the natural language description of odels can be directly applied. The difficulty of this 2 = log arginal likelihood with a odel coplexity penalty. We use a definition where higher eans better odel fit. See Appendix A. extension lies in the O(N 3 ) tie for evaluation of the GP arginal likelihood and its gradients with respect to the kernel hyperparaeters. A naïve approach is to subsaple the data to reduce N, but then we ay fail to capture global structure such as periodicities with long periods or oit a set of points displaying a certain local structure. We show failure cases of rando subsapling in Section 4.3. Regarding ore strategic subsapling, the possibility of a generic subsapling algorith for GPs that is able to capture the aforeentioned properties of the data is a challenging research proble in itself. Alternatively it is tepting to use either an approxiate arginal likelihood or the arginal likelihood of an approxiate odel as a proxy for the likelihood [25, 32, 34, 37], especially with odern GP approxiations scaling to large data sets with illions of data points [15, 42]. However such scalable GP ethods are liited in interpretability as they often behave very differently to the full GP, lacking guarantees for the chosen kernel to faithfully reflect the actual structure in the data. In other words, the real challenge is to scale up the GP while preserving interpretability, and this is a difficult proble due to the tradeoff between scalability and accuracy of GP approxiations. Our work pushes the frontiers of interpretable GPs to ediu-sized data (N = 1K 1K) by reducing the coputational coplexity of ABCD fro O(N 3 ) to O(N 2 ). Extending this to large data sets (N = 1K 1M) is a difficult open proble. Our approach is as follows: we provide a cheap lower bound and upper bound to sandwich the arginal likelihood, and we use this interval for odel selection. To do so we give a brief overview of the relevant work on low rank kernel approxiations used for scaling up GPs, and we later outline how they can be applied to obtain cheap lower and upper bounds. 3.1 Nyströ Methods and Sparse GPs The Nyströ Method [9, 4] selects a set of inducing points in the input space R D that attept to explain all the covariance in the Gra atrix of the kernel; the kernel is evaluated for each pair of inducing points and also between the inducing points and the data, giving atrices K, K N = KN. This is used to create the Nyströ approxiation ˆK = K N (K ) K N of K = K NN, where is the pseudo-inverse. Applying Cholesky decoposition to K, we see that the approxiation adits the low-rank for Φ Φ and so allows efficient coputation of deterinants and inverses in O(N 2 ) tie (see Appendix D). We later use the Nyströ approxiation to give an upper bound on the log arginal likelihood.

3 Hyunjik Ki and Yee Whye Teh The Nyströ approxiation arises naturally in the sparse GP literature, where certain distributions are approxiated by sipler ones involving f, the GP evaluated at the inducing points: the Deterinistic Training Conditional (DTC) approxiation [32] defines a odel that gives the arginal likelihood q(y) = N (y, ˆK + σ 2 I) (y is the vector of observations), whereas the Fully Independent Conditional (FIC) approxiation [34] gives q(y) = N (y, ˆK + diag(k ˆK) + σ 2 I), correcting the Nyströ approxiation along the diagonals. The Partially Independent Conditional (PIC) approxiation [25] further iproves this by correcting the Nyströ approxiation on block diagonals, with blocks typically of size. Note that the approxiation is no longer low rank for FIC and PIC, but atrix inversion can still be coputed in O(N 2 ) tie by Woodbury s Lea (see Appendix D). The variational inducing points ethod (VAR) [37] is rather different to DTC/FIC/PIC in that it gives the following variational lower bound on the log arginal likelihood: log[n (y, ˆK + σ 2 I)] 1 2σ 2 Tr(K ˆK) (1) This lower bound is optiised with respect to the inducing points and the kernel hyperparaeters, which is shown in the paper to successfully yield tight lower bounds in O(N 2 ) tie for reasonable values of. Another useful property of VAR is that the lower bound can only increase as the set of inducing points grows [21, 37]. It is also known that VAR always iproves with extra coputation, and that it successfully recovers the true posterior GP in ost cases, contrary to other sparse GP ethods [4]. Hence this is what we use in SKC to obtain a lower bound on the arginal likelihood and optiise the hyperparaeters. We use 1 rando initialisations of hyperparaeters and choose the one with highest lower bound after optiisation. 3.2 A cheap and tight upper bound on the log arginal likelihood Fixing the hyperparaeters to be those tuned by VAR, we seek a cheap upper bound to the arginal likelihood. Upper bounds and lower bounds are qualitatively different, and in general it is ore difficult to obtain an upper bound than a lower bound for the following reason: first note that the arginal likelihood is the integral of the likelihood with respect to the prior density of paraeters. Hence to obtain a lower bound it suffices to exhibit regions in the paraeter space giving high likelihood. However, to obtain an upper bound one ust deonstrate the absence or lack of likelihood ass outside a certain region, an arguably ore difficult task. There has been soe work on the subject [5, 17], but to the best of our knowledge there has not been any work on cheap upper bounds to the arginal likelihood in large N settings. So finding an upper bound fro the perspective of the arginal likelihood can be difficult. Instead, we exploit the fact that the GP arginal likelihood has an analytic for, and treat it as a function of K. The GP log arginal likelihood is coposed of two ters and a constant: log p(y) = log[n (y, K + σ 2 I)] = 1 2 log det(k + σ2 I) 1 2 y (K + σ 2 I) 1 y N 2 log(2π) (2) We give separate upper bounds on the negative log deterinant (NLD) ter and the negative inner product (NIP) ter. For NLD, it has been proven that 1 2 log det(k + σ2 I) 1 2 log det( ˆK + σ 2 I) (3) a consequence of K ˆK being a Schur copleent of K and hence positive sei-definite (e.g. [3]). So the Nyströ approxiation plugged into the NLD ter serves as an upper bound that can be coputed in O(N 2 ) tie (see Appendix D). As for NIP, we point out that λy (K + σ 2 I) 1 y = N in f H i=1 (y i f(x i )) 2 + λ f 2 H, the optial value of the objective function in kernel ridge regression where H is the Reproducing Kernel Hilbert Space associated with k (e.g. [23]). The dual proble, whose objective function has the sae optial value, is ax α R N λ[α (K + σ 2 I)α 2α y]. So we have the following upper bound: 1 2 y (K + σ 2 I) 1 y 1 2 α (K + σ 2 I)α α y (4) α R N. Note that this is also in the for of an objective for conjugate gradients (CG) [33], hence equality is obtained at the optial value ˆα = (K + σ 2 I) 1 y. We can approach the optiu for a tighter bound by using CG or preconditioned CG (PCG) for a fixed nuber of iterations to get a reasonable approxiation to ˆα. Each iteration of CG and the coputation of the upper bound takes O(N 2 ) tie, but PCG is very fast even for large data sets and using FIC/PIC as the preconditioner gives fastest convergence in general [8]. Recall that although the lower bound takes O(N 2 ) to copute, we need = O(N β ) for accurate approxiations, where β depends on the data distribution and kernel [28]. β is usually close to.5, hence the lower bound is also effectively O(N 2 ). In practice, the upper bound evaluation sees to be a little ore expensive than the lower bound evaluation, but we only need to copute the upper bound once, whereas we ust evaluate the lower bound and its gradients ultiple ties

4 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes Algorith 1: Scalable Kernel Coposition (SKC) Input: data x 1,..., x n R D, y 1,..., y n R, base kernel set B, depth d, nuber of inducing points, kernel buffer size S. Output: k, the resulting kernel. For each base kernel on each diension, obtain lower and upper bounds to ( interval), set k to be the kernel with highest upper bound, and add k to kernel buffer K. C for depth=1:d do Fro C, add to K all kernels whose intervals overlap with k if there are fewer than S of the, else add the kernels with top S upper bounds. for k K do Add following kernels to C and obtain their intervals: (1) All kernels of for k + B where B is any base kernel on any diension (2) All kernels of for k B where B is any base kernel on any diension if k C with higher upper bound than k then k k for the hyperparaeter optiisation. We later confir in Section 4.1 that the upper bound is fast to copute relative to the lower bound optiisation. We also show epirically that the upper bound is tighter than the lower bound in Section 4.1, and give the following sufficient condition for this to be true: Proposition 1. Suppose (ˆλ i ) N i=1 are the eigenvalues of ˆK + σ 2 I in descending order. Then if (P)CG for the NIP ter converges and ˆλ N 2σ 2, then the upper bound is tighter than the lower bound. Notice that ˆλ N σ 2 ˆK, so the assuption is feasible. The proof is in Appendix E. We later show in Section 4 that the upper bound is not only tighter than the lower bound, but also uch less sensitive to the choice of inducing points. Hence we use the upper bound to choose between kernels whose intervals overlap. 3.3 SKC: Scalable Kernel Coposition using the lower and upper bound We base our algorith on two intuitive clais. First, the lower and upper bounds converge to the true log arginal likelihood for fixed hyperparaeters as the nuber of inducing points increases. Second, the hyperparaeters optiised by VAR converge to those obtained by optiising the log arginal likelihood as increases. The forer is confired in Figure 2 as well as in other works (e.g. [4] for the lower bound), and the latter is confired in Appendix F. The algorith proceeds as follows: for each base kernel and a fixed value of, we copute the lower and upper bounds to obtain an interval for the GP arginal likelihood and hence the of the kernel, with its hyperparaeters optiised by VAR. We rank these kernels by their intervals, using the upper bound as a tie-breaker for kernels with overlapping intervals. We then perfor a sei-greedy kernel search, expanding the search tree on all (or soe, controlled by buffer size S) kernels whose intervals overlap with the top kernel at the current depth. We recurse to the next depth by coputing intervals for these child kernels (parent kernel +/ base kernel), ranking the and further expanding the search tree. This is suarised in Algorith 1, and Figure 12 in Appendix Q is a visualisation of the tree for different values of. Details on the optiisation and initialisation are given in Appendices H and I. The hyperparaeters found by VAR ay not be the global axiisers of the GP arginal likelihood, but as in ABCD we can optiise the arginal likelihood with ultiple rando seeds and choose the local optiu closest to the global optiu. One ay still question whether the hyperparaeter values found by VAR agree with the structure in the data, such as period values and slopes of linear trends. We show in Sections 4.2 and 4.3 that a sall suffices for this to be the case. Choice of We can guarantee that the lower bound increases with larger, but cannot guarantee that the upper bound decreases, since we tend to get better hyperparaeters for higher that boost the arginal likelihood and hence the upper bound. We verify this in Section 4.1. Hence throughout SKC we fix to be the largest possible value that one can afford, so that the arginal likelihood with hyperparaeters optiised by VAR is as close as possible to the arginal likelihood with optial hyperparaeters. It is natural to wonder whether an adaptive choice of is possible, using higher for ore proising kernels to tighten their bounds. However a fair coparison of different kernels via this lower and upper bound requires that they have the sae value of, since using a higher is guaranteed to give a higher lower bound. Parallelisability Note that SKC is extreely parallelisable across different rando initialisations and different kernels at each depth, as is CKS. In fact the presence of intervals for SKC and hence buffers of kernels allows further parallelisation over kernels of different depths in certain cases (see Appendix G).

5 Hyunjik Ki and Yee Whye Teh negative log energy ML fullgp Var LB UB NLD Nystro NLD UB RFF NLD UB NIP (a) Solar: fix inducing points CG PCG Nys PCG FIC PCG PIC negative energy log ML fullgp Var LB UB NLD Nystro NLD UB RFF NLD UB NIP (b) Solar: learn inducing points CG PCG Nys PCG FIC PCG PIC Figure 1: (a) Left: log arginal likelihood (ML) for fullgp with optiised hyperparaeters, optiised VAR LB for each of 1 rando initialisations per, log ML for best hyperparaeters out of 1, and corresponding UB. Middle: NLD and UB. Right: NIP and UB after iterations of CG/PCG. (b) Sae as Figure 1a, except learning inducing points for the LB optiisation and using the for subsequent coputations. 4 Experients 4.1 Investigating the behaviour of the lower bound (LB) and upper bound (UB) We present results for experients showing the bounds we obtain for two sall tie series and a ultidiensional regression data set, for which CKS is feasible. The first is the annual solar irradiance data fro 161 to 211, with 42 observations [18]. The second is the tie series Mauna Loa CO2 data [36] with 689 observations. See Appendix K for plots of the tie series. The ultidiensional data set is the concrete copressive strength data set with 13 observations and 8 covariates [44]. The functional for of kernels used for each of these data sets have been found by CKS (see Figure 3). All observations and covariates have been noralised to have ean and variance 1. Fro the left of Figures 1a, 1a and 11a, (the latter two can be found in Appendix Q) we see that VAR gives a LB for the optial log arginal likelihood that iproves with increasing. The best LB is tight relative to the log arginal likelihoods at the hyperparaeters optiised by VAR. We also see that the UB is even tighter than the LB, and increases with as hypothesised. Fro the iddle plots, we observe that the Nyströ approxiation gives a very tight UB on the NLD ter. We also tried using RFF to get a stochastic UB (see Appendix P), but this is not as tight, especially for larger values of. Fro the right plots, we can see that PCG with any of the three preconditioners (Nyströ, FIC, PIC) give a very tight UB to the NIP ter, whereas CG ay require ore iterations to get tight, for exaple in Figures 1a, 1b in Appendix Q. Figure 2 shows further experiental evidence for a wider range of kernels reinforcing the clai that the UB is tighter than the LB, as well as being uch less sensitive to the choice of inducing points. This is why the UB is a uch ore reliable etric for the odel selection than the LB. Hence we use the UB for a one-off coparison of kernels when their intervals overlap. Coparing Figures 1a, 1a, 11a against 1b, 1b, 11b, learning inducing points does not lead to a vast iproveent in the VAR LB. In fact the differences are not very significant, and soeties learning inducing points can get the LB stuck in a bad local iniu, as indicated by the high variance of LB in the latter three figures. Moreover the differences in coputational tie is significant as we can see in Table 2 of Appendix J. Hence the coputation-accuracy trade-off is best when fixing the inducing points to a rando subset of training data. Table 2 also copares ties for the different coputations after fixing the inducing points. The gains fro using the variational LB instead of the full GP is clear, especially for the larger data sets, and we also confir that it is indeed the optiisation of the LB that is the bottleneck in ters of coputational cost. We also see that the NIP UB coputation ties are siilarly fast for all, thus convergence of PCG with the PIC preconditioner is happening in only a few iterations. 4.2 SKC on sall data sets We copare the kernels chosen by CKS and by SKC for the three data sets. The results are suarised in Figure 3. For solar, we see that SKC successfully finds SE PER, which is the second highest kernel for CKS, with very close to the top kernel. For auna, SKC selects (SE + PER) SE + LIN, which is third highest for CKS and a very close to the top kernel. Looking at the values of hyperparaeters in kernels PER and LIN found by SKC, 4 inducing points are sufficient for it to successfully find the correct periods and slopes in the data, reinforcing the clai that we only need a sall to find good hyperparaeters (see Appendix

6 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes LIN1 LIN2 LIN3 LIN4 LIN5 LIN6 LIN7 LIN8 PER1 PER2 PER3 PER4 PER5 PER6 PER7 PER8 SE1 SE2 SE3 SE4 SE5 SE6 SE7 SE8 Figure 2: UB and LB for kernels at depth 1 on each diension of concrete data, while varying the inducing points with hyperparaeters fixed to the optial values for the full GP. Error bars show ean ± 1 standard deviation over 1 rando sets of inducing points. CKS Solar SKC Solar, =8, S=2 SE SE (SE)*PER -> (SE)*PER -> ((SE)*PER)*PER (SE)+PER ((SE)+PER)*SE (((SE)+PER)*SE)+LIN ((((SE)+PER)*SE)+LIN)+SE -> (((((SE)+PER)*SE)+LIN)+SE)+SE CKS Mauna SE SE8 (SE8)+SE1 ((SE8)+SE1)*PER7 (((SE8)+SE1)*PER7)*SE2 ((((SE8)+SE1)*PER7)*SE2)+SE4 -> (((((SE8)+SE1)*PER7)*SE2)+SE4)+LIN CKS Concrete ((SE)*PER)+LIN (SE)+PER ((SE)+PER)*SE -> (((SE)+PER)*SE)+LIN (((SE)+PER)*SE)+PER ((((SE)+PER)*SE)+PER)+LIN (((((SE)+PER)*SE)+PER)+LIN)+LIN SKC Mauna, =8, S=2 SE SE8 (SE8)+SE1 ((SE8)+SE1)+SE2 (((SE8)+SE1)+SE2)+SE4 ((((SE8)+SE1)+SE2)+SE4)+SE3 -> (((((SE8)+SE1)+SE2)+SE4)+SE3)+SE SKC Concrete, =8, S= Figure 3: CKS & SKC results for up to depth 6. Left: of kernels chosen at each depth by CKS. Right: intervals of kernels that have been added to the buffer by SKC with = 8. The arrow indicates the kernel chosen at the end. K for details). For concrete, a ore challenging eight diensional data set, we see that the kernels selected by SKC do not atch those selected by CKS, but it still anages to find siilar additive structure such as SE1+SE8 and SE4. Of course, the intervals for kernels found by SKC are for hyperparaeters found by VAR with = 8, hence do not necessarily contain the optial of kernels in CKS. However the above results show that our ethod is still capable of selecting appropriate kernels even for low values of, without having to hoe in to the optial using high values of. The savings in coputation tie is significant even for these sall data sets, as shown in Table SKC on ediu-sized data sets & Why the lower bound is not enough The UB has arginal benefits over the LB for sall data sets, where the LB is already a fairly good estiate of the true. However, this gap becoes significant as N grows and as kernels becoe ore coplex; the UB is uch tighter and ore stable with respect to

7 load(kw) load(kw) Hyunjik Ki and Yee Whye Teh the choice of inducing points (as shown in Figure 2), and plays a crucial role in the odel selection. (SE1)+SE2 ((SE1)+SE2)+SE2 (((SE1)+SE2)+SE2)*SE3 -> ((((SE1)+SE2)+SE2)*SE3)*SE4 SE1 (SE1)+SE2 ((SE1)+SE2)*PER3 (((SE1)+SE2)*PER3)+LIN4 -> ((((SE1)+SE2)*PER3)+LIN4)+SE2 SKC on Power Plant, = SE1 SKC-LB on Power Plant, = Figure 4: Coparison of SKC (top) and SKC-LB (botto), with = 32, S = 1 to depth 5 on Power Plant data. The forat is the sae as Figure 3 but with the crosses at the true instead of the bounds. Power Plant We first show this for SKC on the Power Plant data with 9568 observations and 4 covariates [38]. We see fro Figure 4 that again the UB is uch tighter than the LB, especially in ore coplex kernels further down the search tree. Coparing the two plots, we also see that the use of the UB in SKC has a significant positive ipact on odel selection: the kernel found by SKC at depth 5 has , whereas the corresponding kernel found by just using the LB (SKC-LB) has The pathological behaviour of SKC-LB occurs at depth 3, where the (SE1+SE2)*PER3 kernel found by SKC-LB has a higher LB than the corresponding SE1+SE2+SE2 kernel found by SKC. SKC-LB chooses the forer kernel since it has a higher LB, which is a failure ode since the latter kernel has a significantly higher. Due to the tightness of the UB, we see that SKC correctly chooses the latter kernel over the forer, escaping the failure ode. CKS vs SKC runtie We run CKS and SKC for = 16, 32, 64 on Power Plant data on the sae achine with the sae hyperparaeter setting. The runties up to depths 1/2 are: 28.7h/94.7h (CKS),.6h/2.6h (SKC, = 16), 1.1h/4.1h (SKC, = 32), 4.2h/15.h (SKC, = 64). Again, the reduction in coputation tie for SKC are substantial. Tie Series We also ipleent SKC on ediu-sized tie series data to explore whether it can successfully find known structure at different scales. Such data sets occur frequently for hourly tie series over several years or daily tie series over centuries. GEFCOM First we use the energy load data set fro the 212 Global Energy Forecasting Copetition [16], and hourly tie series of energy load fro 24/1/1 to 28/6/3, with 8 weeks of issing data, giving N = 38, 7. Notice that this data size is far beyond the scope of full GP optiisation in CKS. # year # day in Jan 24 Figure 5: Top: plot of full GEFCOM data. Botto: zoo in on the first 7 days. Fro the plots, we can see that there is a noisy 6-onth periodicity in the tie series, as well as a clear daily periodicity with peaks in the orning and evening. Despite the periodicity, there are soe noticeable irregularities in the daily pattern. Table 1: Hyperparaeters of kernels found by SKC on GEFCOM data and on Tidal data after noralising y. Length scales, periods, and location converted to original scale, σ 2 left as was found with noralised y. GEFCOM Tidal SE 1 σ 2 =.44 SE σ 2 = 2.32 l = 6.5 days l = 82.5 days PER 1 σ 2 = 1.1 PER 1 σ 2 = 5.17 l = 189 days l = 126 days p = 1.3 days p =.538 days SE 2 σ 2 =.18 LIN σ 2 =.18 l = 331 days loc = year 215. PER 2 σ 2 =.6 PER 2 σ 2 =.8 l = 17 days l = 5974 days p = 174 days p =.5 days LIN σ 2 =.17 PER 3 σ 2 =.21 loc = year 26.1 l = 338 days p = 14.6 days The SE 1 PER 1 + SE 2 (PER 2 + LIN) kernel found by SKC with = 16 and its hyperparaeters are suarised in the first two coluns of Table 1. Note that with only 16 inducing points, SKC has succesfully found the daily periodicity (PER 1 ) and the 6-onth periodicity (PER 2 ). The SE 1 kernel in the first additive ter suggests a local periodicity, exhibiting the irregular observation pattern that is repeated daily. Also note that the hyperparaeter corresponding to the longer periodicity is close but not ly half a year, owing to the noisiness of the periodicity that is apparent in the data. Moreover the agnitude of the second additive ter SE 2 PER 2 that contains the longer periodicity is.18.6, which is uch saller than the agnitude of the first additive ter SE 1 PER 1. This explicitly shows that the second ter has less effect than the first ter on the behaviour of the

8 sea level elevation sea level elevation Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes tie series, hence the weaker long-ter periodicity. Running the kernel search just using the LB with the sae hyperparaeters as SKC, the algorith is only able to detect the daily periodicity and isses the 6-onth periodicity. This is consistent with the observation that the LB is loose copared to the UB, especially for bigger data sets, hence fails to evaluate the kernels correctly. We also tried selecting a rando subset of the data (sizes 32, 64, 128, 256) and running CKS. For even a subset of size 256, the algorith could not find the daily periodicity. None of the four subset size values were able to ake it past depth 4 in the kernel search, struggling to find sophisticated structure. Tidal We also run SKC on tidal data, naely the sea level easureents in Dover fro 213/1/3 to 216/8/31 [1]. This is an hourly tie series, with each observation taken to be the ean of four readings taken every 15 inutes, giving N = 31, 957. essentially has the role of raising the agnitude of the resulting kernel and hence representing noise in the data. The PER 2 kernel is also negligible due to its high length scale and sall agnitude, indicating that the aplitude of the periodicity due to this ter is very sall. Hence the ter has inial effect on the kernel. The kernel search using just the LB fails to proceed past depth 3, since it is unable to find a kernel with a higher LB on the than the previous depth (so the extra penalty incurred by increasing odel coplexity outweighs the increase in LB of log arginal likelihood). CKS on a rando subset of the data siilarly fails, the kernel search halting at depth 2 even for rando subsets as large as size 256. In both cases, SKC is able to detect the structure of data and provide accurate nuerical estiates of its features, all with uch fewer inducing points than N, whereas the kernel search using just the LB or a rando subset of the data both fail. 5 5 Conclusion and Discussion year days since Jan 23 Figure 6: Top: plot of full tidal data. Right: zoo in on the first 4 weeks. Looking at the botto of Figure 6, we can find clear 12- hour periodicities and aplitudes that follow slightly noisy bi-weekly periodicities. The shorter periods, called sei-diurnal tides, are on average 12 hours 25 inutes.518 days long, and the bi-weekly periods arise due to gravitational effects of the oon [22]. The SE PER 1 LIN PER 2 PER 3 kernel found by SKC with = 64 and its hyperparaeters are suarised in the right two coluns of Table 1. The PER 1 PER 3 kernel precisely corresponds to the doubly periodic structure in the data, whereby we have sei-daily periodicities with bi-weekly periodic aplitudes. The SE kernel has a length scale of 82.5 days, giving a local periodicity. This represents the irregularity of the aplitudes as can be seen on the botto plot of Figure 6 between days 2 and 4. The LIN kernel has a large agnitude, but its effect is negligible when ultiplied since the slope is calculated to be (see Appendix K for the slope calculation forula). It We have introduced SKC, a scalable kernel discovery algorith that extends CKS and hence ABCD to bigger data sets. We have also derived a novel cheap upper bound to the GP arginal likelihood that sandwiches the arginal likelihood with the variational lower bound [37], and use this interval in SKC for selecting between different kernels. The reasons for using an upper bound instead of just the lower bound for odel selection are as follows: the upper bound allows for a sei-greedy approach, allowing us to explore a wider range of kernels and copensates for the suboptiality coing fro local optia in the hyperparaeter optiisation. Should we wish to restrict the range of kernels explored for coputational efficiency, the upper bound is tighter and ore stable than the lower bound, hence we ay use the upper bound as a reliable tie-breaker for kernels with overlapping intervals. Equipped with this upper bound, our ethod can pinpoint global/local periodicities and linear trends in tie series with tens of thousands of data points, which are well beyond the reach of its predecessor CKS. For future work we wish to ake the algorith even ore scalable: for large data sets where quadratic runtie is infeasible, we can apply stochastic variational inference for GPs [15] to optiise the lower bound, the bottleneck of SKC, using ini-batches of data. Finding an upper bound that is cheap and tight enough for odel selection would pose a challenge. Also one drawback of the upper bound that could perhaps be resolved is that hyperparaeter tuning by optiising the lower bound is difficult (see Appendix L). One other inor scope for future work is using ore accurate

9 Hyunjik Ki and Yee Whye Teh estiates of the odel evidence than. A related paper uses Laplace approxiation instead of for kernel evaluation in a siilar kernel search context [2]. However Laplace approxiation adds on an expensive Hessian ter, for which it is unclear how one can obtain lower and upper bounds. Acknowledgeents HK and YWT s research leading to these results has received funding fro the European Research Council under the European Union s Seventh Fraework Prograe (FP7/27-213) ERC grant agreeent no We would also like to thank Michalis Titsias for helpful discussions. References [1] British Oceanographic Data Centre, UK Tide Gauge Network. data_systes/sea_level/uk_tide_gauge_network/ processed/. [2] Francis R Bach, Gert RG Lanckriet, and Michael I Jordan. Multiple kernel learning, conic duality, and the so algorith. In ICML, 24. [3] Réy Bardenet and Michalis Titsias. Inference for deterinantal point processes without spectral knowledge. NIPS, 215. [4] Matthias Stephan Bauer, Mark van der Wilk, and Carl Edward Rasussen. Understanding probabilistic sparse gaussian process approxiations. NIPS, 216. [5] Matthew Jaes Beal. Variational algoriths for approxiate Bayesian inference. PhD thesis, University College London, 23. [6] Stephen Boyd and Lieven Vandenberghe. Convex optiization. Cabridge University Press, 24. [7] Corinna Cortes, Mehryar Mohri, and Aeet Talwalkar. On the ipact of kernel approxiation on learning accuracy. In AISTATS, 21. [8] Kurt Cutajar, Michael A. Osborne, John P. Cunningha, and Maurizio Filippone. Preconditioning kernel atrices. ICML, 216. [9] Petros Drineas and Michael Mahoney. On the nyströ ethod for approxiating a gra atrix for iproved kernel-based learning. JMLR, 6: , 25. [1] David Duvenaud, Jaes Lloyd, Roger Grosse, Joshua Tenenbau, and Ghahraani Zoubin. Structure discovery in nonparaetric regression through copositional kernel search. In ICML, 213. [11] Chris Fraley and Adrian E Raftery. Bayesian regularization for noral ixture estiation and odel-based clustering. Journal of classification, 24(2): , 27. [12] Jacob Gardner, Chuan Guo, Kilian Weinberger, Roan Garnett, and Roger Grosse. Discovering and exploiting additive structure for bayesian optiization. In AISTATS, 217. [13] Roger B. Grosse, Ruslan Salakhutdinov, Willia T. Freean, and Joshua B. Tenenbau. Exploiting copositionality to explore a large space of odel structures. In UAI, 212. [14] Isabelle Guyon, Iad Chaabane, Hugo Jair Escalante, Sergio Escalera, Dair Jajetic, Jaes Robert Lloyd, Núria Macià, Bisakha Ray, Lukasz Roaszko, Michèle Sebag, Alexander Statnikov, Sébastien Treguer, and Evelyne Viegas. A brief review of the chalearn autol challenge: Any-tie any-dataset learning without huan intervention. In Proceedings of the Workshop on Autoatic Machine Learning, volue 64, pages 21 3, 216. [15] Jaes Hensan, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data. UAI, 213. [16] Tao Hong, Pierre Pinson, and Shu Fan. Global energy forecasting copetition 212, 214. [17] Chunlin Ji, Haige Shen, and Mike West. Bounded approxiations for arginal likelihoods. 21. [18] Judith Lean, Juerg Beer, and Rayond S Bradley. Reconstruction of solar irradiance since 161: Iplications for cliate cbange. Geophysical Research Letters, 22(23), [19] Jaes Robert Lloyd, David Duvenaud, Roger Grosse, Joshua Tenenbau, and Zoubin Ghahraani. Autoatic construction and natural-language description of nonparaetric regression odels. In AAAI, 214. [2] Gustavo Malkoes, Charles Schaff, and Roan Garnett. Bayesian optiization for autoated odel selection. In NIPS, 216. [21] Alexander G de G Matthews, Jaes Hensan, Richard E Turner, and Zoubin Ghahraani. On sparse variational ethods and the kullback-leibler divergence between stochastic processes. AISTATS, 216. [22] John Morrissey, Jaes L Suich, and Deanna R. Pinkard-Meier. Introduction To The Biology Of Marine Life. Jones & Bartlett Learning, [23] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 212. [24] Junier B Oliva, Avinava Dubey, Andrew G Wilson, Barnabás Póczos, Jeff Schneider, and Eric P Xing. Bayesian nonparaetric kernel-learning. In AISTATS, 216. [25] Joaquin Quinonero-Candela and Carl Edward Rasussen. A unifying view of sparse approxiate gaussian process regression. JMLR, 6: , 25. [26] Ali Rahii and Benjain Recht. Rando features for large-scale kernel achines. In NIPS, 27. [27] Carl Edward Rasussen and Chris Willias. Gaussian processes for achine learning. MIT Press, 26. [28] Alessandro Rudi, Raffaello Caoriano, and Lorenzo Rosasco. Less is ore: Nyströ coputational regularization. In NIPS, 215. [29] Walter Rudin. Fourier analysis on groups. AMS, [3] Yves-Laurent Ko Sao and Stephen Roberts. Generalized spectral kernels. arxiv preprint arxiv: , 215. [31] Gideon Schwarz et al. Estiating the diension of a odel. The Annals of Statistics, 6(2): , [32] Matthias Seeger, Christopher Willias, and Neil Lawrence. Fast forward selection to speed up sparse gaussian process regression. In AISTATS, 23.

10 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes [33] Jonathan Richard Shewchuk. An introduction to the conjugate gradient ethod without the agonizing pain, [34] Edward Snelson and Zoubin Ghahraani. Sparse gaussian processes using pseudo-inputs. In NIPS, 25. [35] Arno Solin and Sio Särkkä. Explicit link between periodic covariance functions and state space odels. JMLR, 214. [36] Pieter Tans and Ralph Keeling. NOAA/ESRL, Scripps Institution of Oceanography. ftp://ftp.cdl.noaa. gov/ccg/co2/trends/co2 lo.txt. [37] Michalis K Titsias. Variational learning of inducing variables in sparse gaussian processes. In AISTATS, 29. [38] Pınar Tüfekci. Prediction of full load electrical power output of a base load operated cobined cycle power plant using achine learning ethods. International Journal of Electrical Power & Energy Systes, 6:126 14, datasets/cobined+cycle+power+plant. [39] Jarno Vanhatalo, Jaakko Riihiäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. Gpstuff: Bayesian odeling with gaussian processes. JMLR, 14(Apr): , 213. [4] Christopher Willias and Matthias Seeger. Using the nyströ ethod to speed up kernel achines. In NIPS, 21. [41] Andrew Gordon Wilson and Ryan Prescott Adas. Gaussian process kernels for pattern discovery and extrapolation. arxiv preprint arxiv: , 213. [42] Andrew Gordon Wilson, Christoph Dann, and Hannes Nickisch. Thoughts on assively scalable Gaussian processes. arxiv preprint arxiv: , [43] Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In AISTATS, 216. [44] I-C Yeh. Modeling of strength of high-perforance concrete using artificial neural networks. Ceent and Concrete research, 28(12): , Concrete+Copressive+Strength.

arxiv: v2 [stat.ml] 14 Feb 2018 Abstract

arxiv: v2 [stat.ml] 14 Feb 2018 Abstract Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes arxiv:176.2524v2 [stat.ml] 14 Feb 218 Abstract Autoating statistical odelling is a challenging proble in artificial

More information

Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels.

Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels. Hyunjik Ki and Yee Whye Teh Appendix A Bayesian Inforation Criterion (BIC) The BIC is a odel selection criterion that is the arginal likelihood with a odel coplexity penalty: BIC = log p(y ˆθ) 1 2 p log(n)

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Circular Pseudo-Point Approximations for Scaling Gaussian Processes

Circular Pseudo-Point Approximations for Scaling Gaussian Processes Circular Pseudo-Point Approximations for Scaling Gaussian Processes Will Tebbutt Invenia Labs, Cambridge, UK will.tebbutt@invenialabs.co.uk Thang D. Bui University of Cambridge tdb4@cam.ac.uk Richard E.

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Correlated Bayesian Model Fusion: Efficient Performance Modeling of Large-Scale Tunable Analog/RF Integrated Circuits

Correlated Bayesian Model Fusion: Efficient Performance Modeling of Large-Scale Tunable Analog/RF Integrated Circuits Correlated Bayesian odel Fusion: Efficient Perforance odeling of Large-Scale unable Analog/RF Integrated Circuits Fa Wang and Xin Li ECE Departent, Carnegie ellon University, Pittsburgh, PA 53 {fwang,

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES ICONIC 2007 St. Louis, O, USA June 27-29, 2007 HIGH RESOLUTION NEAR-FIELD ULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR ACHINES A. Randazzo,. A. Abou-Khousa 2,.Pastorino, and R. Zoughi

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Research in Area of Longevity of Sylphon Scraies

Research in Area of Longevity of Sylphon Scraies IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.

More information

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Variational Bayes for generic topic models

Variational Bayes for generic topic models Variational Bayes for generic topic odels Gregor Heinrich 1 and Michael Goesele 2 1 Fraunhofer IGD and University of Leipzig 2 TU Darstadt Abstract. The article contributes a derivation of variational

More information

On Hyper-Parameter Estimation in Empirical Bayes: A Revisit of the MacKay Algorithm

On Hyper-Parameter Estimation in Empirical Bayes: A Revisit of the MacKay Algorithm On Hyper-Paraeter Estiation in Epirical Bayes: A Revisit of the MacKay Algorith Chune Li 1, Yongyi Mao, Richong Zhang 1 and Jinpeng Huai 1 1 School of Coputer Science and Engineering, Beihang University,

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments Geophys. J. Int. (23) 155, 411 421 Optial nonlinear Bayesian experiental design: an application to aplitude versus offset experients Jojanneke van den Berg, 1, Andrew Curtis 2,3 and Jeannot Trapert 1 1

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Efficient Filter Banks And Interpolators

Efficient Filter Banks And Interpolators Efficient Filter Banks And Interpolators A. G. DEMPSTER AND N. P. MURPHY Departent of Electronic Systes University of Westinster 115 New Cavendish St, London W1M 8JS United Kingdo Abstract: - Graphical

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control An Extension to the Tactical Planning Model for a Job Shop: Continuous-Tie Control Chee Chong. Teo, Rohit Bhatnagar, and Stephen C. Graves Singapore-MIT Alliance, Nanyang Technological Univ., and Massachusetts

More information

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES Journal of Marine Science and Technology, Vol 19, No 5, pp 509-513 (2011) 509 LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES Ming-Tao Chou* Key words: fuzzy tie series, fuzzy forecasting,

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information