A Measure of Directional Outlyingness with Applications to Image Data and Video

Size: px
Start display at page:

Download "A Measure of Directional Outlyingness with Applications to Image Data and Video"

Transcription

1 A Measure of Directional Outlyingness with Applications to Image Data and Video Peter J. Rousseeuw, Jakob Raymaekers and Mia Hubert Department of Mathematics, KU Leuven, Belgium August 8, 206 Abstract Functional data often consist of functions of a univariate quantity such as time or wavelength, whereas images can be considered as functional data with a bivariate domain. In either case the function values may themselves be multivariate. To detect outlying functions, we propose to compute a measure of outlyingness at each gridpoint. For this purpose we introduce directional outlyingness, which accounts for skewness and requires only O(n computation time per direction. We derive the influence function of this outlyingness measure and compute a cutoff for outlier detection. The resulting heatmap and functional outlier map reflect local and global outlyingness of a function. To illustrate the performance of the method on real data it is applied to spectra, MRI images, and video surveillance data. Keywords: Directional outlyingness, Functional Data, Influence function, Robustness, Skewness. This research has been supported by projects of Internal Funds KU Leuven. The authors are grateful for interesting discussions with Pieter Segaert.

2 Introduction Functional data analysis (Ramsay and Silverman, 2005; Ferraty and Vieu, 2006 is a booming research area. Often the focus is on functions with a univariate domain, such as time series or spectra. The function values may be multivariate, such as height and weight of children by age, or measurements of human heart activity at two different places on the body. In this paper we will also consider functions whose domain is multivariate. In particular, images and surfaces are functions on a bivariate domain. Our methods generalize to higher-dimensional domains, e.g. the voxels of a 3D-image of a human brain are defined on a trivariate domain. Detecting outliers in functional data is an important task. Recent developments include the approaches of Febrero-Bande et al. (2008 and Hyndman and Shang (200. Sun and Genton (20 proposed the functional boxplot, and Arribas-Gil et al. (204 developed the outliergram. Our approach is somewhat different. To detect outlying functions or outlying parts of a function (in a data set consisting of several functions we will look at its (possibly multivariate function value in every time point/pixel/voxel/... of its domain. For this purpose we need a tool that assigns a measure of outlyingness to every data point in a multivariate non-functional sample. A popular measure is the Stahel-Donoho outlyingness (SDO due to Stahel (98 and Donoho (982 which works best when the distribution of the inliers is roughly elliptical. However, it is less suited for skewed data. To address this issue, Brys et al. (2005 proposed the (Skewness- Adjusted Outlyingness (AO which takes the skewness of the underlying distribution into account. However, the AO has two drawbacks. The first is that the AO scale has a large bias as soon as the contamination fraction exceeds 0%. Furthermore, its computation time is O(n log(n per direction due to its rather involved construction. To remedy these deficiencies we propose a new measure in this paper, the Directional Outlyingness (DO. The DO also takes the skewness of the underlying distribution into account, but its scale is more robust and it requires only O(n operations. The next section defines the DO, investigates its theoretical properties, and illustrates it on univariate, bivariate and spectral data. Section 3 derives a cutoff value for the DO and applies it to outlier detection. It also extends the functional outlier map of Hubert et al. (205 to 2

3 the DO, and in it constructs a curve separating outliers from inliers. Section 4 shows an application to MRI images, and Section 5 analyzes video data. 2 A Notion of Directional Outlyingness 2. Univariate Setting In the univariate setting, the Stahel-Donoho outlyingness of a point y relative to a sample Y = {y,..., y n } is defined as SDO(y; Y = y med(y MAD(Y ( where the denominator is the median absolute deviation (MAD of the sample, given by MAD(Y = med i ( y i med j (y j /Φ (0.75 where Φ is the standard normal cdf. The SDO is affine invariant, meaning that it remains the same when a constant is added to Y and y, and also when they are multiplied by a nonzero constant. A limitation of the SDO is that it implicitly assumes the inliers (i.e. the non-outliers to be roughly symmetrically distributed. But when the inliers have a skewed distribution, using the MAD as a single measure of scale does not capture the asymmetry. For instance, when the data stem from a right-skewed distribution, the SDO may become large for inliers on the right hand side, and not large enough for actual outliers on the left hand side. This observation led to the (skewness- adjusted outlyingness (AO proposed by Brys et al. (2005. This notion employs a robust measure of skewness called the medcouple (Brys et al., 2004, which however requires O(n log(n computation time. Moreover, we will see in the next subsection that it leads to a rather large explosion bias. In this paper we propose the notion of directional outlyingness (DO which also takes the potential skewness of the underlying distribution into account, while attaining a smaller computation time and bias. The main idea is to split the sample into two halfsamples, and then to apply a robust scale estimator to each of them. More precisely, let y y 2... y n be a univariate sample. (The actual algorithm does not require sorting the data. We then construct two subsamples of size h = n+ 2 as follows. For even n we take Y a = {y h+,..., y n } and Y b = {y,..., y h } where the subscripts 3

4 a and b stand for above and below the median. For odd n we put Y a = {y h,..., y n } and Y b = {y,..., y h } so that Y a and Y b share one data point and have the same size. Next, we compute a scale estimate for each subsample. Among many available robust estimators we choose a one-step M-estimator with Huber ρ-function due to its fast computation and favorable properties. We first compute initial scale estimates s o,a (Y = med(z a /Φ (0.75 and s o,b (Y = med(z b /Φ (0.75 where Z a = Y a med(y and Z b = med(y Y b for gaussian data. The one-step M-estimates are then given by ( zi s a (Y = s o,a (Y 2αh s o,a (Y s b (Y = s o,b (Y 2αh and where Φ (0.75 ensures consistency z i Z a ρ c z i Z b ρ c ( zi s o,b (Y where ρ c denotes the Huber rho function for scale ρ c (t = ( t c 2 [ c,c] + (,c (c, with c a tuning parameter regulating the trade-off between efficiency and bias, and where α = 0 ρ c (xdφ(x. Finally, the directional outlyingness (DO of a point y relative to a univariate sample Y = {y,..., y n } is given by DO(y; Y = y med(y s a(y if y med(y med(y y s b (Y if y med(y. Note that DO is affine invariant. In particular, flipping the signs of Y and y interchanges s a and s b which results in DO( y; Y = DO(y; Y. Figure illustrates the denominators of the SDO and DO expressions on the family income dataset from which contains 8962 strictly positive incomes in the tax year 202. Their histogram is clearly right-skewed. The MAD in the denominator of SDO equals $42, 650 and is used both to the left and to the right of the median, as depicted by the orange arrows. For the DO the above scale s a = $58, 68 exceeds the below scale s b = $35, 737 (blue arrows. Therefore, a point to the right of the 4 (2 (3

5 median will have a lower DO than a point to the left at the same distance to the median. This is a desirable property in view of the difference between the left and right tails. Frequency DO scales SDO scales Annual family income in USD Figure : Scale estimates of the family income data. 2.2 Robustness Properties Let us now study the robustness properties of the scales s a and s b and the resulting DO. It will be convenient to write s a and s b as functionals of the data distribution F : s 2 a(f = s2 o,a(f ( med(f α ρ x med(f c s o,a(f df (x s 2 b (F = s2 o,b (F α where ρ c is the Huber ρ-function. med(f ( ρ med(f x c s o,b df (x (F We will first focus on the worst-case bias of s a due to a fraction ε of contamination, following Martin and Zamar (993. At a given distribution F, the explosion bias curve of s a is defined as B + (ε, s a, F = sup G F ε (s a (G (4 5

6 where F ε = {G : G = ( εf + εh} in which H can be any distribution. The implosion bias curve is defined similarly as B (ε, s a, F = inf G F ε (s a (G. From here onward we will assume that F is symmetric about some center m and has a continuous density f(x which is strictly decreasing in x > m. In order to derive the explosion and implosion bias we require the following lemma (all proofs can be found in the supplementary material: Lemma. (i For fixed µ it holds that t 2 (ii For fixed σ > 0 it holds that σ 2 t ρ ( x µ µ c t ρ c ( x t σ df (x is strictly increasing in t > 0; df (x is strictly decreasing in t. Proposition. For any 0 < ε < 0.25 the implosion bias of s a is given by where B (ε, s a, F 2 = α B (ε, s o,a, F 2 { ( ε B + (ε,med,f ρ c B + (ε, med, F = F ( B (ε, s o,a, F = Φ ( 3 4 2( ε {F ( 3 4ε 4( ε F ( is the distribu- 2( ε. Note that the implosion breakdown ( In fact, the implosion bias of s a is reached when H = (F ( tion that puts all its mass in the point F 2( ε value of s a is 25% because for ε 0.25 we obtain s a 0. ( } x B + (ε,med,f B (ε,s o,a,f df (x 2( ε } Proposition 2. For any 0 < ε < 0.25 the explosion bias of s a is given by B + (ε, s a, F 2 = α B+ (ε, s o,a, F 2 { ( ε B + (ε,med,f ρ c ( } x B + (ε,med,f B + (ε,s o,a,f df (x + ε. where B + (ε, s o,a, F = Φ ( 3 4 {F ( 3 4( ε F ( 2( ε }. The explosion bias of s a is reached at all distributions F ε = ( εf + ε (d for which d > B + (ε, med, F + cb + (ε, s o,a, F which ensures that d lands on the constant part of ρ c. For ε 0.25 we find d and s a, so the explosion breakdown value of s a is 25%. The blue curves in Figure 2 are the explosion and implosion bias curves of s a when F = Φ is the standard gaussian distribution, and the tuning constant in ρ c is c = 2. corresponding 6

7 to 85% efficiency. By affine equivariance the curves for s b are exactly the same, so these are the curves of both DO scales. The orange curves correspond to explosion and implosion of the scales used in the adjusted outlyingness AO under the same contamination. We see that the AO scale explodes faster, due to using the medcouple in its definition. The fact that the DO scale is typically smaller enables the DO to attain larger values in outliers. Scale AO explosion DO explosion AO implosion DO implosion ε Figure 2: Comparison of explosion and implosion bias of the AO and DO scales. Another tool to measure the (non-robustness of a procedure is the influence function (IF. Let T be a statistical functional, and consider the contaminated distribution F ε,z = ( εf + ε (z. The influence function of T at F is then given by T (F ε,z T (F IF(z, T, F = lim = ε 0 ε ε T (F ε,z and basically describes how T reacts to a small amount of contamination. This concept justifies our choice for the function ρ c. Indeed, the IF of a fully iterated M-estimator of scale with function ρ is proportional to ρ(z β with the constant β = ρ(xdf (x. We use ρ = ρ c with c = 2.. It was shown in Hampel et al. (986 that at F = Φ this ρ c yields the M-estimator with highest asymptotic efficiency subject to an upper bound on the absolute value of its IF. 7 ε=0

8 We will now derive the IF of the one-step M-estimator s a given by (4. Proposition 3. The influence function of s a is given by { 2α s sa(f IF(z, s ( 2 med(f 2 o,a(f a, F = s o,a(f ρ x med(f c s o,a(f df (x ( med(f xρ x med(f c s o,a(f df (x + med(f ( } med(f ρ x med(f c s o,a(f df (x IF(z, s o,a, F { ( } med(f x med(f ρ c s o,a(f df (x IF(z, med, F { } + (zρ [med(f, c(z med(f α where IF(z, s o,a, F is the influence function of s o,a. IF(z,s a,φ z Figure 3: Influence function of s a at F = Φ. The resulting IF of s a at F = Φ is shown in Figure 3. [Note that IF(z, s b, Φ = IF( z, s a, Φ.] It is bounded, indicating that s a is robust to a small amount of contamination even when it is far away. Note that the IF has a jump at the third quartile Q due to the initial estimate s o,a. If we were to iterate (4 to convergence this jump would vanish, but then the explosion bias would go up a lot, similarly to the computation in Rousseeuw and Croux (994. Let us now compute the influence function of DO(x; F given by (3 for contamination in the point z, noting that x and z need not be the same. 8

9 Proposition 4. When x > med(f it holds that IF(z, DO(x, F = s2a (F {IF(z, med, F sa (F + IF(z, sa, F (x med(f } whereas for x < med(f we obtain IF(z, DO(x, F = s2b (F {IF(z, med, F sb (F IF(z, sb, F (med(f x} position of x position of z Figure 4: Influence function of DO(x for F = Φ. For a fixed value of x the influence function of DO(x is bounded in z, which is a desirable robustness property. Figure 4 shows the influence function when F is the standard gaussian distribution. 2.3 Multivariate Setting In the multivariate setting we can apply the principle that a point is outlying with respect to a dataset if it stands out in at least one direction. Like the Stahel-Donoho outlyingness, 9

10 the multivariate DO is defined by means of univariate projections. To be precise, the DO of a d-variate point y relative to a d-variate sample Y = {y,..., yn } is defined as DO(y; Y = sup DO(y T v; Y T v (5 v Rd where the right hand side uses the univariate DO of (3. To compute the multivariate DO we have to rely on approximate algorithms, as it is impossible to project on all directions v in d-dimensional space. A popular procedure to generate a direction is to randomly draw d data points, compute the hyperplane passing through them, and then to take the direction v orthogonal to it. This guarantees that the multivariate DO is affine invariant. That is, the DO does not change when we add a constant vector to the data, or multiply the data by a nonsingular d d matrix. 000 (b 000 (a Figure5:5:Bloodfat Bloodfatdata datawith with(a (ado SDO contours,and and(b (bsdo DO contours. contours. Figure contours, an illustration Directional we take the bloodfat data set, which contains plasma cholesterol 2.4 AsFunctional Outlyingness and plasma triglyceride concentrations (in mg/dl of 320 male subjects for whom there is We now extend the DO to data where the objects are functions. To fix ideas we will consider evidence of narrowing arteries (Hand et al., 994. Here n = 320 and d = 2, and following an example. The glass data set consists of spectra with d = 750 wavelengths resulting from Hubert and Van der Veeken (2008 we generated 250d = 500 directions v. Figure 5 shows spectroscopy on n = 80 archeological glass samples (Lemberge et al., Figure 6 the contour plots of both the DO and SDO measures. We see that the contours of the DO shows the 80 curves. capture the skewness in the dataset, whereas those of the SDO are more symmetric even though the data themselves are not. Intensity

11 2.4 Functional Directional Outlyingness We now extend the DO to data where the objects are functions. To fix ideas we will consider an example. The glass data set consists of spectra with d = 750 wavelengths resulting from spectroscopy on n = 80 archeological glass samples (Lemberge et al., Figure 6 shows the 80 curves Intensity Wavelength Figure 6: Spectra of 80 archeological glass samples. At each wavelength the response is a single number, the intensity, so this is a univariate functional dataset. However, we can incorporate the dynamic behavior of these curves by numerically computing their first derivative. This yields bivariate functions, where the response consists of both the intensity and its derivative. In general we write a functional dataset as Y = {Y, Y 2,..., Y n } where each Y i is a d-dimensional function. As in this example, the Y i are typically observed on a discrete set of points in their domain. For a univariate domain this set is denoted as {t,..., t T }. Now we want to define the DO of a d-variate function X on the same domain, where X need not be one of the Y i. For this we look at all the domain points t j in turn, and define the functional directional outlyingness (fdo of X with respect to the sample Y as fdo(x; Y = T DO(X(t j ; Y (t j W (t j (6 j= where W (. is a weight function for which T j= W (t j =. Such a weight function allows us to assign a different importance to the outlyingness of a curve at different domain points.

12 For instance, one could downweight time points near the boundaries if measurements are recorded less accurately at the beginning and the end of the process. fdo Index Figure 7: fdo values of the 80 glass data functions. Figure 7 shows the fdo of the 80 bivariate functions in the glass data, where W (. was set to zero for the first 3 wavelengths where the spectra had no variability, and constant at the remaining wavelengths. These fdo values allow us to rank the spectra from most to least outlying, but do not contain much information about how the outlying curves are different from the majority. In addition to this global outlyingness measure fdo we also want to look at the local outlyingness. To this end Figure 8 shows the individual DO(Y i (t j ; Y (t j for each of the 80 functions Y i of the glass data at each wavelength t j. Higher values of DO are shown by darker red in this heatmap. Now we see that there are a few groups of curves with particular anomalies: one group around function 25, one around function 60, and one with functions near the bottom. Note that the global outlyingness measure fdo flags outlying rows in this heatmap, whereas the dark spots inside the heatmap can be seen as outlying cells. It is also possible to sort the rows of the heatmap according to their fdo values. Note that the wavelength at which a dark spot in the heatmap occurs allows to identify the chemical element responsible. 2

13 50 Observation no Wavelength 3 Outlier Detection Figure 8: Heatmap of DO of the glass data 3. A Cutoff for Directional Outlyingness When analyzing a real data set we do not know its underlying distribution, but still we would like a rough indication of which observations should be flagged as outliers. For this purpose we need an approximate cutoff value on the DO. We first consider non-functional data, leaving the functional case for the next subsection. Let Y = {y,..., y n } be a d-variate dataset (d with directional outlyingness values {DO,..., DO n }. The DO i have a right-skewed distribution, so we transform them to {LDO,..., LDO n } = {log(0.+ DO,..., log(0. + DO n } of which the majority is closer to gaussian. Then we center and normalize the resulting values in a robust way and compare them to a high gaussian quantile, flagging y i as outlying whenever LDO i > med(ldo +MAD(LDO Φ (

14 This is equivalent to comparing the original DO values to the cutoff c DO = exp ( med(ldo + MAD(LDO Φ ( (7 For an illustration we return to the family income data of Figure. The blue vertical line in Figure 9 corresponds to the DO cutoff, whereas the orange line is the result of the same computation applied to the Stahel-Donoho outlyingness. The DO cutoff is the more conservative one, because it takes the skewness of the income distribution into account. Frequency DO cutoff SDO cutoff c SDO c DO 0e+00 e+05 2e+05 3e+05 4e+05 5e+05 Annual family income in USD Figure 9: Outlier cutoffs for the family income data. Figure 0 shows the DO and SDO cutoffs for the bivariate bloodfat data of Figure 5. The DO captures the skewness in the data and flags only one point as outlying, wheres the SDO takes a more symmetric view and also flags five of the presumed inliers. To investigate this effect when the data follow a known distribution we carry out a simulation. We generate 50 samples of 500 observations from the bivariate Marshall-Olkin distribution (Marshall and Olkin, 967 with parameters λ =, λ 2 = 4 and λ 2 = 0.5. To these samples we add 0% of contamination at several positions on the lines g (x = 3 x and g 2 (x = x shown in Figure. 3 Figure 2 shows the average percentage of outliers found as a function of the x-coordinate of the contamination. The left panel is for contamination along g (x, where we see that 4

15 DO cutoff SDO cutoff Figure 0: Outlier detection on bloodfat data the DO finds the outliers faster than the SDO, both on the left and on the right. On the other hand, for contamination along g 2 (x the projection of the inliers on that direction is very skewed. As a result, the DO finds the outliers faster on the left and slightly slower on the right, as it should DO cutoff SDO cutoff Figure 0: Outlier detection on bloodfat data (a (b x y x y Figure : Marshall-Olkin data with outliers along the lines (a y = g (x and (b y = g 2 (x. the DO finds the outliers faster than the SDO, both on the left and on the right. On the other hand, for contamination along g 2 (x the projection of the inliers on that direction is very skewed. As a result, the DO finds the outliers faster on the left and slightly slower on the right, as it should. Figure : Marshall-Olkin data with outliers along the lines (a y = g (x and (b y = g 2 (x. 5

16 (a (b Proportion of outliers found DO method SD method Proportion of outliers found DO method SD method x coordinate of contamination x coordinate of contamination Figure Figure 2: 2: Marshall-Olkin Marshall-Olkin data: data: percentage percentage of of flagged flagged outliers outliers along along both both lines. lines The The Functional Outlier Map When When the the data data set set consists consistsofoffunctions there canbe be several types types of of outlyingness. outlyingness. As an As an aid to distinguish between them, Hubert et al. (205 introduced a graphical tool called aid to distinguish between them, Hubert et al. (205 introduced a graphical tool called the functional outlier map (FOM. Here we will extend the FOM to the new DO measure the functional outlier map (FOM. Here we will extend the FOM to the new DO measure and add a cutoff to it, in order to increase its utility. and add a cutoff to it, in order to increase its utility. Consider a functional dataset Y = {Y, Y 2,..., Y n }. The fdo [see (6] of a function Consider a functional dataset Y = {Y Y i can be interpreted as the average outlyingness, Y 2,..., Y of n }. The fdo [see (6] of a function its (possibly multivariate function Y i can values. be interpreted We now also as measure the average the variability outlyingness of its DO of values, its (possibly by multivariate function vdo(y i ; Y = stdev ( values. We now also measure the variability of j DO(Yi its DO (t j ; values, Y (t j by. (8 + fdo(y i ; Y Note that (8 has thevdo(y fdo in i ; the Y denominator = stdev ( j DO(Yi (t j ; Y (t j in order to measure relative. instead of absolute variability. This can be understood as follows. Suppose that the functions Y i are (8 + fdo(y i ; Y Note that (8 has the fdo in the denominator in order to measure relative instead of absolute 2 stdev variability. centered around zero and that Y k (t j = 2 Y i (t j for all j. Then stdev j (DO(Y k (t j ; Y (t j = j (DO(Y i (tthis j ; Y (t can j be butunderstood their relativeas variability follows. issuppose the same. that Because the fdo(y functions k ; Y Y= i are centered 2 fdo(y around i ; Y, zero putting andfdo thatiny k the (t j denominator = 2 Y i (t j for normalizes all j. Then for this. stdevin j (DO(Y the numerator k (t j ; Y (t we j = 2 stdev could j (DO(Y also compute i (t j ; Y (t a j weighted but their standard relative deviation variability with the is the weights same. W Because (t j from fdo(y (6. k ; Y = 2 fdo(ythe i ; Y FOM, putting is thenfdo scatter in theplot denominator of the points normalizes for this. In the numerator we could also compute a weighted standard ( fdo(y i ; Y deviation, vdo(ywith i ; Y the weights W (t j from (6. (9 The FOM is then the scatter plot of the points 6 ( fdo(y i ; Y, vdo(y i ; Y (9 6

17 for i =,..., n. Its goal is to reveal outliers in the data, and its interpretation is fairly straightforward. Points in the lower left part of the FOM represent regular functions which hold a central position in the data set. Points in the lower right part are functions with a high fdo but a low variability of DO values. This happens for shift outliers, i.e. functions which have a regular shape but are shifted on the whole domain. Points in the upper left part have a low fdo but a high vdo. Typical examples are local outliers, i.e. functions which only display outlyingness over a small part of their domain. The points in the upper right part of the FOM have both a high fdo and a high vdo. These correspond to functions which are strongly outlying on a substantial part of their domain vdo fdo Figure 3: FOM of the glass data As an illustration we revisit the glass data. Their FOM in Figure 3 contains a lot more information than their fdo values alone in Figure 7. In the heatmap (Figure 8 we noticed three groups of outliers, which also stand out in the FOM. The first group consists of the spectra 20, 22, 23, 28, 30, 3 and 33. Among these, number 30 lies furthest to the right in the FOM. It corresponds to row 30 in Figure 8 which has a dark red piece. It does not look like a shift outlier, for which the row would have a more homogeneous color (hence a lower vdo. The second group, with functions 57 63, occupies a similar position in the FOM. The group standing out the most consists of functions They are situated 7

18 in the upper part of the FOM, indicating that they are shape outliers. Indeed, they deviate strongly from the majority in three fairly small series of wavelengths. Their outlyingness is thus more local than that of functions We now add a new feature to the FOM, namely a rule to flag outliers. For this we define the combined functional outlyingness (CFO of a function Y i as CFO i = CFO(Y i ; Y = (fdo i / med(fdo 2 + (vdo i / med(vdo 2 (0 where fdo i = fdo(y i ; Y and med(fdo = med(fdo,..., fdo n, and similarly for vdo. Note that the CFO characterizes the points in the FOM through their Euclidean distance to the origin, after scaling. We expect outliers to have a large CFO. In general, the distribution of the CFO is unknown but skewed to the right. To construct a cutoff for CFO we use the same reasoning as for the cutoff (7 on fdo: First we compute LCFO i = log(0. + CFO i for all i =,..., n, and then we flag function Y i as outlying if LCFO i med(lcfo MAD(LCFO > Φ ( ( This yields the dashed curve (which is part of an ellipse in the FOM of Figure 3. 4 Application to Image Data Images are functions on a bivariate domain. In practice the domain is a grid of discrete points, e.g. the horizontal and vertical pixels of an image. It is convenient to use two indices j =,..., J and k =,..., K, one for each dimension of the grid, to characterize these points. An image (or a surface is then a function on the J K points of the grid. Note that the function values can be univariate, like gray intensities, but they can also be multivariate, e.g. the intensities of red, green and blue (RGB. In general we will write an image dataset as a sample Y = {Y, Y 2,..., Y n } where each Y i is a function from {(j, k; j =,..., J and k =,..., K} to R d. The fdo (6 and vdo (8 notions that we saw for functional data with a univariate domain can easily be extended to functions with a bivariate domain by computing fdo(y i ; Y = J K DO(Y i (j, k; Y (j, k W jk (2 j= k= 8

19 where the weights W jk must satisfy J j= K k= W jk =, and vdo(y i ; Y = stdev j,k(do(y i (j, k; Y (j, k + fdo(y i ; Y (3 where the standard deviation can also be weighted by the W jk. (The simplest weight function is the constant W jk = /(JK for all j =,..., J and k =,..., K. Note that (2 and (3 can trivially be extended to functions with domains in more than 2 dimensions, such as three-dimensional images consisting of voxels. In each case we obtain fdo i and vdo i values that we can plot in a FOM, with cutoff value (. As an illustration we analyze a dataset containing MRI brain images of 46 subjects aged between 8 and 96 (Marcus et al., 2007, which can be freely accessed at For each subject several images are provided; we will use the masked atlasregistered gain field-corrected images resampled to mm isotropic pixels. The masking has set all non-brain pixels to an intensity value of zero. The provided images are already normalized, meaning that the size of the head is exactly the same in each image. The images have 76 by 208 pixels, with grayscale values between 0 and 255. All together we thus have 46 observed images Y i containing univariate intensity values Y i (j, k, where j =,..., J = 76 and k =,..., K = 208. There is more information in such an image than just the raw values. We can incorporate shape information by computing the gradient in every pixel of the image. The gradient in pixel (j, k is defined as the 2-dimensional vector Y i (j, k = in which the ( Yi (j,k j, Y i(j,k k derivatives have to be approximated numerically. In the pixels at the boundary of the brain we compute forward and backward finite differences, and for the other pixels we employ central differences. In the horizontal direction we thus compute one of three expressions: ( 3 Y Y i (j, k i (j, k + 4 Y i (j +, k Y i (j + 2, k/2 (forward difference = (Y j i (j +, k Y i (j, k/2 (central difference (Y i (j 2, k 4 Y i (j, k + 3 Y i (j, k/2 (backward difference depending on where the pixel is located. computed analogously. The derivatives in the vertical direction are Incorporating these derivatives yields a dataset of dimensions , so the final Y i (j, k are trivariate. For each subject we thus have three data matrices which 9

20 represent the original MRI image and its derivatives in both directions. Figure 4 shows these three matrices for subject number 387. Figure4: Original MRI image of subject 387, and its derivatives in the horizontal and vertical direction. The functional DO of an MRI image Y i is given by (2: fdo(y i i ; Y = DO(Y i (j, k; Y (j, k W jk jk, j= k= where DO(Y where DO(Y i (j, k; Y (j, k is the DO of the trivariate point Y i (j, k; (j, k is the DO of the trivariate point i (j, k relative to the trivariate i (j, k relative to the trivariate dataset {Y dataset {Y (j, k,..., Y (j, k,..., 46 (j, k}. In this example the have set the weight W 46 (j, k} In this example the have set the weight jk equal to jk equal to zero at the grid points that are not part of the brain, shown as the black pixels around it. zero at the grid points that are not part of the brain, shown as the black pixels around it. The grid points inside the brain receive full weight. The grid points inside the brain receive full weight. Figure 5 shows the resulting FOM, which indicates the presence of several outliers. Figure 5 shows the resulting FOM, which indicates the presence of several outliers. Image 26 has the highest fdo combined with a relatively low vdo. This suggests a shift Image 26 has the highest fdo combined with relatively low vdo. This suggests shift outlier, i.e. a function whose values are all shifted relative to the majority of the data. outlier, i.e. a function whose values are all shifted relative to the majority of the data. Images 29 and 92 have a large fdo in combination with a high vdo, indicating that they Images 29 and 92 have large fdo in combination with high vdo, indicating that they have strongly outlying subdomains. Images 08, 88 and 234 have an fdo which is on the have strongly outlying subdomains. Images 08, 88 and 234 have an fdo which is on the high end relative to the dataset, which by itself does not make them outlying. Only in high end relative to the dataset, which by itself does not make them outlying. Only in combination with their large vdo are they flagged as outliers. These images have strongly combination with their large vdo are they flagged as outliers. These images have strongly outlying subdomains which are smaller that those of functions 29 and 92. The remaining outlying subdomains which are smaller that those of functions 29 and 92. The remaining flagged images are fairly close to the cutoff, meaning they are merely borderline cases. flagged images are fairly close to the cutoff, meaning they are merely borderline cases. In order to find out why a particular image is outlying it is instructive to look at a In order to find out why a particular image is outlying it is instructive to look at a heatmap of its DO values. In Figure 6 we compare the MRI images (on the left and heatmap of its DO values. In Figure 6 we compare the MRI images (on the left and 20 20

21 fdo vdo Figure 5: FOM of the MRI dataset. the DO heatmaps (on the right of subjects 387, 92, and 26. DO values of 5 or higher received the darkest color. Image 387 has the smallest CFO value, and can be thought of as the most central image in the dataset. As expected, the DO heatmap of image 387 shows very few outlying pixels. For subject 92, the DO heatmap nicely marks the region in which the MRI image deviates most from the majority of the images. Note that the boundaries of this region have the highest outlyingness. This is due to including the derivatives in the analysis, as they emphasize the pixels at which the grayscale intensity changes. The DO heatmap of subject 26 does not show any extremely outlying region but has a rather high outlyingness over the whole domain, which explains its large fdo and regular vdo value. The actual MRI image to its left is globally lighter than the others, confirming that it is a shift outlier. 5 Application to Video We analyze a surveillance video of a beach, filmed with a static camera (Li et al., This dataset can be found at model/bk index.html and consists of 633 frames. 2

22 Figure Figure 6: MRI 6: MRI image image (left (left and anddo DOheatmap (right of of subjects (top, (top, 92 (middle, 92 (middle, and 26 (bottom. and 26 (bottom. The first 8 seconds of the video show a beach with a tree, as in the leftmost panel of Figure 7. Then a man enters the screen from 22 the left (second panel, disappears behind the tree (third panel, and then reappears to the right of the tree and stays on screen until the end of the video. The frames have pixels and are stored using the RGB 22

23 (Red, Green and Blue color model, so each frame corresponds to three matrices of size Overall we have 633 frames Y i containing trivariate Y i (j, k for j =,..., J = 60 and k =,..., K = 28. Figure Figure 7: Frames 8: Frames number number 00, 487, 487, and and from from the the video video dataset. dataset significant Computing part the of their fdo domain. (2 this Frames data set is timedepict consuming the man sincedisappearing we have to execute behind the thetree. projection Duringpursuit these algorithm frames, the (5 fao in R 3 decreases for each pixel, againsoas60 the subdomain 28 = 20, 480ontimes. whichthe the observation is outlying decreases. From frame 492 onwards, the man reappears on the entire computation took about on hour and a half on a laptop. Therefore we switch to an right side of the tree and stays in the screen until the end of the video, as is confirmed on alternative the FOM. computation. Once again, Wethese define observations the componentwise have ado substantial of a d-variate number point ofypixels relative on which to a d-variate they aresample strongly Y outlying. = {y,..., y n } as CDO(y; Y = d DO(y h ; Y h 2 (4 487 h= where DO(y 495 h ; Y h is the univariate DO of the h-th coordinate 488of y relative to the h-th 484 coordinate of Y. Analyzing the video data with this componentwise procedure took under 2 minutes, so it is about 50 times faster than with projection pursuit, and it produced vao curve that separates the regular 493 frames from the outliers. At frame 483 the man enters the slowly, as the fraction of the pixels covered by the man is still low at this stage. This frame almost the same FOM. Figure 8 shows the FOM obtained from the CDO computation. The first 480 frames, which depict the beach and the tree with only the ocean surface moving slightly, are found at the bottom left part of the FOM. They fall inside the dashed picture, making the standard deviation of the DO rise slightly. The fdo increases more can thus be seen as locally outlying..0 The subsequent.4 frames have very 2.2 high fdo and vdo. In them the man is clearly visible between fao the left border of the frame and the tree, so these frames have outlying pixels in a substantial part of their domain. Frames Figure 9: FOM of the video dataset see the man disappear behind the tree, so the fdo goes down as the fraction of In addition to the FOM, we can construct 23 AO heatmaps of individual frames. For this example, we have used a threshold of 5 to construct the AO maps, so all the AO values above this threshold are set to 5. Figure 0 shows for frames number 00, 487, 49 and 500 the raw image on the left, the AO heatmap in the middle and the FOM on

24 vdo fdo Figure 8: FOM of the video data. outlying pixels decreases. From frame 493 onward the man reappears to the right of the tree and stays on screen until the end. These frames contain many outlying pixels, yielding points in the upper right part of the FOM. In the FOM we also labeled frame, which lies close to the outlyingness border. Further inspection indicated that this frame is a bit lighter than the others, which might be due to the initialization of the camera at the start of the video. In addition to the FOM we can draw DO heatmaps of the individual frames. For frames 00, 487, 49 and 500, Figure 9 shows the raw frame on the left, the DO heatmap in the middle and the FOM on the right, in which a blue circle marks the position of the frame. In this figure we can follow the man s path in the FOM, while the DO heatmaps show exactly where the man is in those frames. We have created a video in which the raw frame, the DO heatmap and the FOM evolve alongside each other. It can be downloaded from 24

25 Figure 9: Left: Frames 00, 487, 49 and 500 from the video. Middle: DO heatmaps of Figure 9: Left: Frames 00, 487, 49 and 500 from the video. Middle: DO heatmaps of these frames. Right: FOM with blue marker at the position of the frame. these frames. Right: FOM with blue marker at the position of the frame. 6 Conclusion exactly where the man is in those frames. We have created a video in which the raw frame, the TheDO notion heatmap of directional and theoutlyingness FOM evolve(do alongside is well-suited each other. for It skewed can be distributions. downloadedit from has good robustness properties, and lends itself to the analysis of univariate, multivariate, and functional data, in which both the domain and the function values can be multivariate

26 Rough cutoffs for outlier detection are available. The DO is also a building block of several graphical tools like DO heatmaps, DO contours, and the functional outlier map (FOM. These proved useful when analyzing spectra, MRI images, and surveillance video. In the MRI images we added gradients to the data in order to reflect shape/spatial information. In video data we could also add the numerical derivative in the time direction. In our example this would make the frames 6-dimensional, but the componentwise DO in (4 would remain fast to compute. 7 Available software R-code for computing the DO and reproducing the examples is available from our website References Arribas-Gil, A. and Romo, J. (204, Shape outlier detection and visualization for functional data: the outliergram, Biostatistics 5, Brys, G., Hubert, M. and Rousseeuw, P. J. (2005, A robustification of Independent Component Analysis, Journal of Chemometrics 9, Brys, G., Hubert, M. and Struyf, A. (2004, A robust measure of skewness, Journal of Computational and Graphical Statistics 3, Donoho, D. (982, Breakdown properties of multivariate location estimators, Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston. Febrero-Bande, M., Galeano, P. and González-Manteiga, W. (2008, Outlier detection in functional data by depth measures, with application to identify abnormal NO x levels, Environmetrics, 9, Ferraty, F. and Vieu, P. (2006, Nonparametric Functional Data Analysis: Theory and Practice, Springer, New York. 26

27 Hampel, F., Ronchetti, E., Rousseeuw, P. and Stahel, W. (986, Robust Statistics: The Approach Based on Influence Functions, Wiley, New York. Hand, D., Lunn, A., McConway, A. and Ostrowski, E. (994, A Handbook of Small Data Sets, Chapman and Hall, London. Hubert, M., Rousseeuw, P. J. and Segaert, P. (205, Multivariate functional outlier detection (with discussion, Statistical Methods & Applications 24, Hubert, M. and Van der Veeken, S. (2008, Outlier detection for skewed data, Journal of Chemometrics 22, Hyndman, R. and Shang, H. (200, Rainbow plots, bagplots, and boxplots for functional data, Journal of Computational and Graphical Statistics, 9, Lemberge, P., De Raedt, I., Janssens, K., Wei, F. and Van Espen, P. (2000, Quantitative Z-analysis of 6th 7th century archaeological glass vessels using PLS regression of EPXMA and µ-xrf data, Journal of Chemometrics 4, Li, L., Huang, W., Gu, I. Y.-H. and Tian, Q. (2004, Statistical modeling of complex backgrounds for foreground object detection, IEEE Transactions on Image Processing 3, Marcus, D., Wang, T., Parker, J., Csernansky, J., Morris, J. and Buckner, R. (2007, Open Access Series of Imaging Studies (OASIS: Cross-sectional MRI data in young, middle aged, nondemented, and demented older adults, Journal of Cognitive Neuroscience 9, Marshall, A.W. and Olkin, O. (967, A multivariate exponential distribution, Journal of the American Statistical Association 62, Martin, R. and Zamar, R. (993, Bias robust estimation of scale, The Annals of Statistics 2, Ramsay, J. and Silverman, B. (2005, Functional Data Analysis, 2nd edn, Springer, New York. 27

28 Rousseeuw, P. and Croux, C. (994, The bias of k-step M-estimators, Statistics & Probability Letters 20, Stahel, W. (98, Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen, PhD thesis, ETH Zürich. Sun, Y. and Genton, M. (20, Functional boxplots, Journal of Computational and Graphical Statistics, 20, SUPPLEMENTARY MATERIAL Proof of Lemma (i. Let µ R be fixed. For the function ρ c we have that t 2 µ ρ c ( x µ t df (x = µ = 0 { ( x µ 2 c x µ t c + t2 x µ t >c { ( u } 2 0 u ct + t 2 c ct<u df (µ + u } df (x The function t ( u c 2 0 u ct + t 2 ct<u is increasing in t > 0 and even strictly increasing at large u. This proves (i since f(x > 0 in all x. Proof of Lemma (ii. Fix σ > 0. It follows from the Leibniz integral rule that t { σ 2 t ρ c ( x t σ because ρ c (0 = 0. Note now that This implies that t { σ 2 t } df (x = σ ρ t c ( ρ x t c σ > 0 for t x < t + σc ( ρ x t c σ = 0 for x > t + σc. ( ρ x t } c σ df (x < 0 for all t. ( x t σ df (x Proof of Proposition. Let 0 < ε < 0.25 be fixed and let F ε,h be a minimizing distribution, i.e. inf F ε,g F ε,g (s a (F ε,g = s a (F ε,h 28

29 with F ε,g = ( εf + ε G. Inserting the contaminated distribution F ε,h into s a (F ε,h yields the scale { s 2 o,a(f ε,h ( ε α ( x med(fε,h ρ c med(f ε,h s o,a (F ε,h ( x med(fε,h + ε ρ c med(f ε,h s o,a (F ε,h df (x dh(x }. (5 For simplicity, put W (F ε,h = W 2 (F ε,h = med(f ε,h med(f ε,h ( x med(fε,h ρ c df (x s o,a (F ε,h ρ c ( x med(fε,h s o,a (F ε,h dh(x. (6 We then have the contaminated scale s 2 o,a(f ε,h {( ε W (F ε,h + εw 2 (F ε,h }. (7 α Denote by Q 2,ε = F ε,h (0.5 and Q 3,ε = F (0.75 the median and the third quartile of the contaminated distribution. ε,h For the distribution H it has to hold that H(Q 3,ε = and lim x Q H(x = 0. This 2,ε can be seen as follows. Suppose H( H(Q 3,ε = p (0, ]. Then consider F ε,h where H(x + p (Q H 2,ε for x (, Q 3,ε ] (x = else and denote by Q 2,ε and Q 3,ε the median and third quartile of F ε,h. Note that Q 2,ε = Q 2,ε and Q 3,ε > Q 3,ε. Therefore, we have s o,a (F ε,h > s o,a (F ε,h. It then follows from Lemma (i that s o,a (F ε,h 2 ( εw (F ε,h > s o,a (F ε,h 2 ( εw (F ε,h and W 2 (F ε,h > W 2 (F ε,h = 0. Therefore s o,a (F ε,h 2 εw 2 (F ε,h > s o,a (F ε,h 2 εw 2 (F ε,h. It now follows that s a (F ε,h < s a (F ε,h, which is a contradiction since H minimizes s a. Therefore, H( H(Q 3,ε = 0. A similar argument can be made to show that lim x Q H(x = 0. It follows 2,ε that H(Q 3,ε = and H(x = 0 for all x < Q 2,ε, so all the mass of H is inside [Q 2,ε, Q 3,ε ]. We can now argue that H must have all its mass in Q 2,ε. Note that if H(Q 3,ε = and ( ( ( ] lim x Q H(x = 0 we have Q 2,ε = F and Q 2,ε 2( ε 3,ε [F 3 4ε, F 3, 4( ε 4( ε depending on lim x Q H(x. Given that Q 2,ε is fixed, we can minimize W (F ε,h by 3,ε 29

30 ( minimizing Q 3,ε. Now Q 3,ε is minimal for H = (F as this yields Q 2( ε 3,ε = ( F 3 4ε. Note that this choice of H to minimize Q 3,ε is not unique as any H which 4( ε makes lim x Q H(x = does the job. Note finally that W 2(F ε,h is also minimal for H = ( 3,ε (F as ρ 2( ε c (t is nondecreasing in t, and this choice of H yields W 2 (F ε,h = 0. ( We now know that H = (F minimizes s 2( ε a (F ε,h. Furthermore, we have ( ( Q 2,ε = F = B + (ε, med, F and Q 3,ε = F 3 4ε. Therefore the implosion bias of s a is 2( ε B (ε, s a, F 2 = B (ε,s o,a,f 2 α { ( ε B + (ε,med,f ρ c 4( ε ( } x B + (ε,med,f B (ε,s o,a,f df (x where B + (ε, med, F = F ( B (ε, s o,a, F = 2( ε ( F ( 3 4ε 4( ε F ( 2( ε /Φ (0.75. Proof of Proposition 2. Let 0 < ε < 0.25 be fixed and let F ε,h be a maximizing distribution, i.e. sup (s a (F ε,g = s a (F ε,h F ε,g F ε,g with F ε,g = ( εf + ε G. Inserting the contaminated distribution F ε,h into s a (F ε,h yields the scale (5, which can be rewritten as in (6 and (7. For the distribution H it has to hold that H(Q 3,ε = lim x Q H(x. This can be seen 2,ε as follows. Suppose H(Q 3,ε H(x = p (0, ]. Now put e = B + (ε, med, F + lim x Q 2,ε cb + (ε, s o,a, F and consider the distribution F ε,h where H(x for x (, Q 2,ε H (x = lim H(x for x [Q 2,ε, Q 3,ε ] x Q 2,ε H(x p + p (e for x (Q 3,e, and denote by Q 2,ε and Q 3,ε the median and third quartile of F ε,h. Note that Q 2,ε = Q 2,ε and Q 3,ε < Q 3,ε. Therefore s o,a (F ε,h < s o,a (F ε,h and thus s o,a (F ε,h 2 ( εw (F ε,h < s o,a (F ε,h 2 ( εw (F ε,h because of Lemma (i. Furthermore, W 2 (F ε,h < W 2 (F ε,h 30

31 because ρ c (t is nondecreasing in t, thus s o,a (F ε,h 2 εw 2 (F ε,h < s o,a (F ε,h 2 εw 2 (F ε,h. It now follows that s a (F ε,h > s a (F ε,h, which is a contradiction since H maximizes s a. Therefore H(Q 3,ε = lim x Q 2,ε H(x, so H has no mass inside [Q 2,ε, Q 3,ε ]. Without loss of generality we can thus assume that H is of the form H = d (e + ( d (e 2 with e = B (ε, med, F cb + (ε, s o,a, F and e 2 = B + (ε, med, F + cb + (ε, s o,a, F where d [0, ]. This choice of e and e 2 is not unique but it maximizes s a (F ε,h because ρ c (t is nondecreasing in t. Inserting the distribution F d := F ε,h yields { s 2 a(f d = s2 o,a(f d ( } x Q2,d ( ε df (x + ε( d α s o,a (F d where Q 2,d = F ( 2dε 2( ε, Q 3,d = F ( Q 2,d ρ c 3 4dε 4( ε (8 and s o,a (F d = (Q 3,d Q 2,d /Φ (0.75. Note that this expression depends on d but no longer on e and e 2. We will show that this expression is maximized for d = 0. First we show that s o,a (F d is maximized for d = 0. Let g(d := s o,a (F d = (Q 3,d Q 2,d /Φ (0.75 (9 for any d [0, ]. Note that ξ = 3 4εd 4( ε 2 4εd 4( ε = 4( ε does not depend on d. Therefore, we can write g(d = (F (v+ξ F (v/φ (0.75 where v = 2 4εd is a strictly decreasing 4( ε function of d. Note that we can write g(d = (Φ (0.75 v+ξ v du. The density f is f(f (u symmetric about some m, and by affine equivariance we can assume m = 0 w.l.o.g. Since f is unimodal with f(x > 0 for all x, the function u f(f (u is strictly decreasing up to its minimum (corresponding to the mode of f and then strictly increasing. Therefore, g(d is maximal for v as large as possible, i.e. for d = 0. In that case, we have v = Q 2,o = 2 > ( ε Next, we maximize h(σ, d := σ2 α { ( ε Q 2,d ρ c ( x Q2,d for any fixed σ > 0. This is equivalent to maximizing df (x + q ρ c ( x q σ σ } df (x + ε( d (20 ε ( d (2 ε 3

32 [ where q is such that F (q q 2ε 2( ε, 2( ε ] = 2 ± ε ε. Note that ε 2ε ( d = F (q+, ε 2( ε where the second term doesn t depend on q. Maximizing (2 with respect to q is therefore equivalent to maximizing q+cσ ( x q 2 q+cσ cσ df (x df (x. Note that this is equal to q cσ 0 ( x cσ that x2 σ 2 c 2 σ 2 c 2 2 cσ df (q + x df (q + x = cσ 0 0 x 2 σ 2 c 2 f(q + xd(x. For all x in [0, cσ] it holds σ 2 c 2 0, hence the latter integral is maximized by minimizing f(q + x for all x [0, cσ]. For this q must take on its highest possible value q = F ( 2 + ε ε, because we then have f(q+x f(q 2 +x for all x in [0, cσ] and all q 2 in [ F ( 2 Therefore, (20 is maximized for d = 0. ε, F ( + ε ]. ε 2 ε We now know that (9 and (20 satisfy max d g(d = g(0 and max d h(σ, d = h(σ, 0 for all σ > 0. By Lemma (i, h(σ, 0 is increasing in σ. Combining these results yields max d s 2 a(f d = max d h(g(d, d max d max d2 h(g(d, d 2 = max d h(g(d, 0 = h(g(0, 0 so s 2 a(f d is maximized for d = 0. Therefore, s 2 a(f ε,h is maximized for H = (e 2, hence Q 2,ε = F ( = 2( ε B+ (ε, med, F and Q 3,ε = F 3 (. The explosion bias is thus 4( ε B + (ε, s a, F 2 = α B+ (ε, s o,a, F 2 { ( ε B + (ε,med,f ρ c ( } x B + (ε,med,f B + (ε,s o,a,f df (x + ε where B + (ε, s o,a, F = { F ( 3 4( ε F ( 2( ε } /Φ (0.75. Proof of Proposition 3. Plugging the contaminated distribution F ε,z = ( εf + ε z into the functional form (4 of s a yields s 2 a(f ε,z = s2 o,a(f ε,z α med(f ε,z ( x med(fε,z ρ c df ε,z (x. s o,a (F ε,z We take the derivative with respect to ε and evaluate it in ε = 0. Note that ρ c (t is not differentiable at t = c and t = c, but as these two points form a set of measure zero this does not affect the integral containing ρ c(t. We also use that ρ c (0 = 0 and IF(z, s 2 a, F = 2s a (F IF(z, s a, F yielding the desired expression. For F = Φ we have IF (z, s o,a, Φ = 32

A Measure of Directional Outlyingness with Applications to Image Data and Video

A Measure of Directional Outlyingness with Applications to Image Data and Video A Measure of Directional Outlyingness with Applications to Image Data and Video arxiv:608.0502v2 [stat.me] 3 Mar 207 Peter J. Rousseeuw, Jakob Raymaekers and Mia Hubert Department of Mathematics, KU Leuven,

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Package cellwise. June 26, 2018

Package cellwise. June 26, 2018 Type Package Version 2.0.10 Date 2018-06-25 Title Analyzing Data with Cellwise Outliers Depends R (>= 3.2.0) Package cellwise June 26, 2018 Suggests knitr, robusthd, robustbase, rrcov, MASS, svd, ellipse

More information

Stahel-Donoho Estimation for High-Dimensional Data

Stahel-Donoho Estimation for High-Dimensional Data Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

arxiv: v1 [math.st] 2 Aug 2007

arxiv: v1 [math.st] 2 Aug 2007 The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Robust Classification for Skewed Data

Robust Classification for Skewed Data Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Robust Classification for Skewed Data Mia Hubert Stephan Van der Veeken Received: date / Accepted: date Abstract

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

Highly Robust Variogram Estimation 1. Marc G. Genton 2

Highly Robust Variogram Estimation 1. Marc G. Genton 2 Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is

More information

Sparse PCA for high-dimensional data with outliers

Sparse PCA for high-dimensional data with outliers Sparse PCA for high-dimensional data with outliers Mia Hubert Tom Reynkens Eric Schmitt Tim Verdonck Department of Mathematics, KU Leuven Leuven, Belgium June 25, 2015 Abstract A new sparse PCA algorithm

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 2264 2274 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

Robust multivariate methods in Chemometrics

Robust multivariate methods in Chemometrics Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Fall 07 Official Lecture Notes Note Introduction Previously, we introduced vectors and matrices as a way of writing systems of linear equations more

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Grassmann Averages for Scalable Robust PCA Supplementary Material

Grassmann Averages for Scalable Robust PCA Supplementary Material Grassmann Averages for Scalable Robust PCA Supplementary Material Søren Hauberg DTU Compute Lyngby, Denmark sohau@dtu.dk Aasa Feragen DIKU and MPIs Tübingen Denmark and Germany aasa@diku.dk Michael J.

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Letter-value plots: Boxplots for large data

Letter-value plots: Boxplots for large data Letter-value plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

Robust control charts for time series data

Robust control charts for time series data Robust control charts for time series data Christophe Croux K.U. Leuven & Tilburg University Sarah Gelper Erasmus University Rotterdam Koen Mahieu K.U. Leuven Abstract This article presents a control chart

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

ON THE MAXIMUM BIAS FUNCTIONS OF MM-ESTIMATES AND CONSTRAINED M-ESTIMATES OF REGRESSION

ON THE MAXIMUM BIAS FUNCTIONS OF MM-ESTIMATES AND CONSTRAINED M-ESTIMATES OF REGRESSION ON THE MAXIMUM BIAS FUNCTIONS OF MM-ESTIMATES AND CONSTRAINED M-ESTIMATES OF REGRESSION BY JOSE R. BERRENDERO, BEATRIZ V.M. MENDES AND DAVID E. TYLER 1 Universidad Autonoma de Madrid, Federal University

More information

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1 Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction

More information

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath. TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University

More information

Unitary Dynamics and Quantum Circuits

Unitary Dynamics and Quantum Circuits qitd323 Unitary Dynamics and Quantum Circuits Robert B. Griffiths Version of 20 January 2014 Contents 1 Unitary Dynamics 1 1.1 Time development operator T.................................... 1 1.2 Particular

More information

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of Duisburg-Essen, Germany, and Technical University Eindhoven,

More information

Robust Linear Discriminant Analysis and the Projection Pursuit Approach

Robust Linear Discriminant Analysis and the Projection Pursuit Approach Robust Linear Discriminant Analysis and the Projection Pursuit Approach Practical aspects A. M. Pires 1 Department of Mathematics and Applied Mathematics Centre, Technical University of Lisbon (IST), Lisboa,

More information

Robust Preprocessing of Time Series with Trends

Robust Preprocessing of Time Series with Trends Robust Preprocessing of Time Series with Trends Roland Fried Ursula Gather Department of Statistics, Universität Dortmund ffried,gatherg@statistik.uni-dortmund.de Michael Imhoff Klinikum Dortmund ggmbh

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Statistical Depth Function

Statistical Depth Function Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,

More information

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom Alireza Avanaki ABSTRACT A well-known issue of local (adaptive) histogram equalization (LHE) is over-enhancement

More information

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Exploratory data analysis: numerical summaries

Exploratory data analysis: numerical summaries 16 Exploratory data analysis: numerical summaries The classical way to describe important features of a dataset is to give several numerical summaries We discuss numerical summaries for the center of a

More information

The influence function of the Stahel-Donoho estimator of multivariate location and scatter

The influence function of the Stahel-Donoho estimator of multivariate location and scatter The influence function of the Stahel-Donoho estimator of multivariate location and scatter Daniel Gervini University of Zurich Abstract This article derives the influence function of the Stahel-Donoho

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 2

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note Vectors and Matrices In the previous note, we introduced vectors and matrices as a way of writing systems of linear equations

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION REVSTAT Statistical Journal Volume 15, Number 3, July 2017, 455 471 OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION Authors: Fatma Zehra Doğru Department of Econometrics,

More information

Descriptive statistics

Descriptive statistics Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 2

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Spring 9 Lecture Notes Note Vectors and Matrices In the previous note, we introduced vectors and matrices as a way of writing systems of linear equations

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Copulas, a novel approach to model spatial and spatio-temporal dependence

Copulas, a novel approach to model spatial and spatio-temporal dependence Copulas, a novel approach to model spatial and spatio-temporal dependence Benedikt Gräler 1, Hannes Kazianka 2, Giovana Mira de Espindola 3 1 Institute for Geoinformatics, University of Münster, Germany

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

1 Measures of the Center of a Distribution

1 Measures of the Center of a Distribution 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

An Overview of Multiple Outliers in Multidimensional Data

An Overview of Multiple Outliers in Multidimensional Data Sri Lankan Journal of Applied Statistics, Vol (14-2) An Overview of Multiple Outliers in Multidimensional Data T. A. Sajesh 1 and M.R. Srinivasan 2 1 Department of Statistics, St. Thomas College, Thrissur,

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Why is the field of statistics still an active one?

Why is the field of statistics still an active one? Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Function-on-Scalar Regression with the refund Package

Function-on-Scalar Regression with the refund Package University of Haifa From the SelectedWorks of Philip T. Reiss July 30, 2012 Function-on-Scalar Regression with the refund Package Philip T. Reiss, New York University Available at: https://works.bepress.com/phil_reiss/28/

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Influence Functions for a General Class of Depth-Based Generalized Quantile Functions

Influence Functions for a General Class of Depth-Based Generalized Quantile Functions Influence Functions for a General Class of Depth-Based Generalized Quantile Functions Jin Wang 1 Northern Arizona University and Robert Serfling 2 University of Texas at Dallas June 2005 Final preprint

More information

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric

More information

8 The SVD Applied to Signal and Image Deblurring

8 The SVD Applied to Signal and Image Deblurring 8 The SVD Applied to Signal and Image Deblurring We will discuss the restoration of one-dimensional signals and two-dimensional gray-scale images that have been contaminated by blur and noise. After an

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Homework 2 Solutions Kernel SVM and Perceptron

Homework 2 Solutions Kernel SVM and Perceptron Homework 2 Solutions Kernel SVM and Perceptron CMU 1-71: Machine Learning (Fall 21) https://piazza.com/cmu/fall21/17115781/home OUT: Sept 25, 21 DUE: Oct 8, 11:59 PM Problem 1: SVM decision boundaries

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview Network Simulation Chapter 5: Traffic Modeling Prof. Dr. Jürgen Jasperneite 1 Chapter Overview 1. Basic Simulation Modeling 2. OPNET IT Guru - A Tool for Discrete Event Simulation 3. Review of Basic Probabilities

More information

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about 0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 7 02 December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about Two-Player zero-sum games (min-max theorem) Mixed

More information

Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

More information

In the previous chapter, we learned how to use the method of least-squares

In the previous chapter, we learned how to use the method of least-squares 03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.

More information

Design of Experiments

Design of Experiments Design of Experiments D R. S H A S H A N K S H E K H A R M S E, I I T K A N P U R F E B 19 TH 2 0 1 6 T E Q I P ( I I T K A N P U R ) Data Analysis 2 Draw Conclusions Ask a Question Analyze data What to

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

High Breakdown Point Estimation in Regression

High Breakdown Point Estimation in Regression WDS'08 Proceedings of Contributed Papers, Part I, 94 99, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS High Breakdown Point Estimation in Regression T. Jurczyk Charles University, Faculty of Mathematics and

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD Sarah Franklin, Marie Brodeur, Statistics Canada Sarah Franklin, Statistics Canada, BSMD, R.H.Coats Bldg, 1 lth floor, Ottawa,

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information