Concavity in Data Analysis

Size: px
Start display at page:

Download "Concavity in Data Analysis"

Transcription

1 Journal of Classification 20:77-92 (2003) 77 Concavity in Data Analysis Michael P. Windham University of South Alabama Abstract: Concave functions play a fundamental role in the structure of and minimization of badness-of-fit functions used in data analysis when etreme values of either parameters or data need to be penalized. This paper summarizes my findings about this role. I also describe three eamples where concave functions are useful: building measures of badness-of-fit, building robust M-estimators, and building clustering objective functions. Finally, using the common thread of concavity, all three will be combined to build a comprehensive, fleible procedure for robust cluster analysis. Keywords: Optimization, Objective functions, Least squares, Partitioning, Badness-of-fit, k-means, Miture analysis, Fuzzy clustering, M-Estimators, Robust cluster analysis. Author s address: Michael P. Windham, Department of Mathematics and Statistics, University of South Alabama, Mobile, AL , USA

2 78 Michael P. Windham 1. Introduction Twenty years ago I gave a lecture on cluster analysis wherein I stated that certain clustering algorithms involved minimizing an objective function built with a concave function. A member of the audience insisted that one does not minimize concave functions. Admitting that this person was correct, I spent the remainder of the year trying to eplain to him and to myself why concavity was there and why it was the reason the algorithm worked. Since then I have continued to work in cluster analysis and also in robust statistical estimation and optimization. I have found many times that concave functions have played a fundamental role in the structure of and minimization of badness-of-fit functions used in these areas of data analysis. This paper describes that role. As eamples, I will describe three instances of the use of concave functions: building measures of badness-of-fit, building robust M-estima tors, and building clustering objective functions. In each of these cases etreme values of either parameters or data need to be penalized. I will discuss how and why concavity is involved in the penalizing process. Finally, I will put all three together to build a comprehensive, fleible fit criterion for robust cluster analysis. Concavity also facilitates the minimization of the fit criteria using iterative majorization (Heiser, 1995) as will be described in Section 7. This paper organizes and unifies the common thread of concavity that runs through the several eamples. The role of concavity has not always been recognized in data analysis, and a fundamental role it is. The latter is the main point of this paper. 2. Concave Functions Concave functions are involved in data analysis in reducing the influence of etreme values of some measure of fit and optimizing an objective function built from the measure. Before eamining specific uses of concave functions, I will review the basic facts about them that will be useful here. There are many characterizations of concave functions, but the following is the best for our purposes. A function, f : R p R is concave on a set S, if for each r in S there is a 1 p matri, F(r), satisfying for all r and r 0 in S f(r) F(r 0 )(r r 0 ) + f(r 0 ). (1) When f is tiable, the matri F(r) is its derivative and concavity simply says that the graph of f always lies below its tangent planes. For

3 Concavity in Data Analysis 79 None Delayed 1 Smooth, unbounded Bounded Figure 1: Types of concave functions. this paper, I will use lower case letters for a concave function and the same upper case letter for the 1 p matri valued function required for the linear term in (1). Another, perhaps more common definition of concavity is that for any r 1 and r 2 in a conve set S and 0 1 f( r 1 + (1 )r 2 ) (r 1 ) + (1 )f(r 2 ) That is, the value of f on a conve combination of vectors eceeds the conve combination of the values. Suppose that r represents a measure of fit between data and parameters describing data for which large values suggest poor fit. The of concave functions on reducing the influence of etreme values is shown in Figure 1. In each case, f(r) increases as r does, but f(r) r and the increases as r increases, so that the influence of large values of r is reduced. In building the measures of fit themselves, Section 3, concavity is used to penalize etreme values of the parameters in the measure, so that desirable minima eist. In robust estimation, Section 4, concavity reduces the influence of outliers, and in cluster analysis, Section 5, concavity localizes on a cluster and reduces the influence in describing the cluster of data from other clusters.

4 80 Michael P. Windham 3. Parameter Estimation Using Measures of Fit The basic goal of eploratory data analysis is to summarize data with manageable quantities that describe some informative characteristics of the data. For eample, for a set X of n data points in R d, the mean = /n gives a central location for the data, and the covariance or scatter matri, Ŝ = ( )( )T /n describes the shape of the data set in terms of ellipsoids determined by a natural choice of a metric, ( ) T Ŝ 1 ( ). It is even more satisfying when a natural choice is a best choice in some sense. For eample, the squared Euclidean norm, m 2, is a measure of the fit between a data point with a possible center of concentration m, in that the further m is from, the more incompatible they are with each other. It is well-known that the mean is the value of m that minimizes m 2, that is is a best choice for a center of a data set in that it is on the average least incompatible with the data in the set. It would be nice to have a similar measure of fit that leads to Ŝ being a best choice. A natural candidate would be to minimize d(m, S) = ( m)t S 1 ( m), particularly since for any positive semidefinite S, we have d(, S) d(m, S) for all m. It then to minimize d(, S) = ( ) T S 1 ( ) = n tr(ŝs 1 ), where tr( ) denotes the trace of a matri. Unfortunately, the function d has no minimum as a function of S. The problem is that the larger S is (in terms of say eigenvalues), the smaller d is. People have added penalty terms to the measure to discourage large values of scale. Finding such terms leads to the first use of concavity, namely using concave functions to penalize large values of scale in measuring fit. The first step is to describe how to etend a function g : R R to a function assigning symmetric matrices to symmetric matrices. The function g is etended to a symmetric matri S by letting g(s) = B diag(g( 1),..., g( d))b T, where B is an orthogonal matri whose columns are a basis of eigenvectors of S and 1,..., d are the corresponding eigenvalues. Using this notion, we have the following. If g is concave and increasing, and G satisfies g(s) G(s 0 )(s s 0 ) + g(s 0 ) for all s and s 0, then D(m, S) = ( m) T G(S)( m) + tr(g(s) G(S)S) (2) is minimized by m = and S = Ŝ, the sample mean and covariance matri. Moreover, if g(0) 0 then ( m) T G(S)( m) + tr(g(s) G(S)S) 0 for all, m, and positive semidefinite S.

5 !!! Concavity in Data Analysis 81 This result was presented in Windham (2000) without proof, so I will give one here. Clearly, for a fied, positive semidefinite S, D(m, S) is minimized for m =. It then to show that D(, S) = n tr(g(s)(ŝ S)+g(S)) is minimized as a function of S by S = Ŝ. Letting ˆ be a diagonal matri of eigenvalues of Ŝ, it follows by moving orthogonal matrices around in the trace function that, it to show that tr(g(s)( ˆ S)+g(S)) g(ˆ i i) for all positive semidefinite S. Letting S = " " T and again moving the orthogonal matri B, we have from the concavity of g tr(g(s)(! ˆ S) + g(s)) = i G( i) j b 2 ijˆ j i + g( i) i g j b 2 ij ˆ j. Since rows and columns of B are orthonormal vectors, i b 2 ij = j b 2 ij = 1, so that j b2 ijˆ j is a conve combination of the eigenvalues of Ŝ. Therefore, from the concavity of the function g, we have i g j b2 ijˆ j i j b2 g(ˆ ij j) = j g(ˆ j) and the result follows. Moreover, if g(0) 0, then as above, ( m) T G(S)( m) + tr(g(s) G(S)S) = D(m, S) for a data set consisting of alone, so that D(m, S) is minimized by m = and S = 0. That is, the minimum value of ( m) T G(S)( m) + tr(g(s) G(S)S) is D(, 0) = g(0) 0, completing the proof. It should be noted that since g is increasing, we have G 0, so that G(S) is at least positive semidefinite. Therefore, ( m) T G(S)( m) + tr(g(s) G(S)S) acts like a metric on the data space. The goal was to build a fit function that is minimized by the mean and scatter matri of the data. Any function of the form in (2) is minimized by as long as G(S) is positive semidefinite, which requires only that G be non-negative. If so, then one needs to minimize tr(g(s)(ŝ S) + g(s)) as a function of S. For simplicity consider the one-dimensional case with g any twice tiable function. For d(s) = g# (s)(ŝ s) + g(s), we have d# (s) = g# # (s)(ŝ s), so that ŝ is a critical point. It is the concavity of g that ensures that ŝ is, in fact, a minimizer. For eample, with g(s) = log(s), then G(s) = 1/s, and we have the measure ( m) T S 1 ( m) + log det S, ignoring constant terms. This

6 82 Michael P. Windham eample is popular, but does not satisfy the condition that g(0) = 0, which ensures that the measure of fit is non-negative, and facilitates its use in the applications that follow. The function g(s) = log(1 + s), for eample, does satisfy g(0) = 0 and yields ( m) T (I + S) 1 ( m) + tr(log(i + S) (I + S) 1 S). The second column of Table 1 lists several functions that can be used for g. For functions in the table with two descriptions, ecept for the Cauchy, the first epression applies for 0 r 1 and the second for r > 1. We have now a procedure for building badness-of-fit measures that are minimized by the mean and scatter matri of a data set and that have properties that facilitate their use in data analysis. 4. Robust M-estimation The notion of choosing descriptors for a data set by minimizing total badness-of-fit between points in the set and some summary descriptor of the data is called M-estimation in the contet of robust statistics. Huber (1981) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986) discuss M-estimators in detail and give a variety of procedures for estimating location and shape. In general, if t is a descriptor of a data set and r(, t) is a measure of fit between a data point and a descriptor t, then the M-estimator for t is the value of t that minimizes r(, t). The discussion of the previous section provides measures of fit for M-estimation of location and shape. Some estimators are not robust, in that their values are sensitive to a few data points that may not represent the information in the bulk of the data. The mean, for eample, is sensitive to outliers. The median, on the other hand, is robust to the presence of outliers. The median is also an M-estimator, in that, it is the value of m that minimizes m. The measure of fit used for the median is the composition with the measure used for the mean and a concave, increasing function. In particular, m = m 2. In other words, we have f(r) = r with r = m 2. Therefore, the median should be less sensitive than the mean to outliers because the concavity of the square root reduces their influence. This observation is easily generalized to provide a structure for robustizing M-estimators. For a function r(, t) measuring fit between and t and a concave, increasing function h, let $ (, t) = h(r(, t)). Since h is increasing, $ is also a measure of fit with the influence of etreme incompatibility reduced by the concavity of h. Therefore, minimizing $ (, t) = h(r(, t)) should lead to more robust estimates of t

7 Concavity in Data Analysis 83 Table 1: Robustizers. h(r) % None r 0 Median r Huber r 2r Biweight 1 3 (1 (1 r)3 ) Cauchy for &(' ) ((1 + r) 1 ) 1),&+*= 1 log(1 + r), & = Fair 2(r 1 2 log(1 + r 1 2 )) Logistic 2 log(cosh(r 1 2 )) Talwar r , Welsch 1 e r Andrews 2 2 (1 cos(- r 1 2 )) For functions with two descriptions, ecept for the Cauchy, the first epression applies for 0. r. 1 and the second for r > 1. Tuning constants / are discussed in Section 4. 4, 2 than would be obtained by minimizing r(, t). There are many robust M-estimators of location in the literature. What is not so widely known is that often these can be obtained by composing m 2 with some concave function. A list of the estimators is given in Table 1 along with the concave function h. These concave functions can be used with any measures of fit as was illustrated by Verboon and Heiser (1992) who applied the Huber and biweight functions to squared residuals for orthogonal Procrustes analysis. One caution that must be observed is that the measure of fit is changed by simply rescaling the data. Doing so artificially changes the point in the data space where the robustizing has a particular level of influence. Tuning constants are used to compensate for data scale, that is, use $ (, t) = h(0 r(, t)) 0 for a positive tuning constant 0. Dividing by 0 is not necessary, but doing so often results in 0 = 0 corresponding to no robustizing. This situation occurs when h is tiable at zero and h# (0) = H(0) = 1. Under these conditions $ (, t) approaches r(, t) as 0

8 84 Michael P. Windham approaches zero. In fact, whenever h is tiable at zero, adjusting h by constants will make h(0) = 0 and H(0) = 1 without the fit interpretation. The functions in Table 1 satisfy these conditions ecept for the median. Tuning constants can also compensate somewhat for the t levels concavity in t choices for h. Possible tuning constants for each method are also given in Table 1. These are based on values appearing in Holland and Welsch (1977) for achieving 95% 5 in estimating location under a standard normal model. For clarity, I will not include tuning constants in what follows, but one can apply them as above, if desired. 5. Cluster Analysis Cluster analysis attempts to identify substructure in a data set or population by grouping objects together into clusters based on their being more similar to each other than to other objects in the study. Objective function methods determine descriptions of clusters that minimize a measure of fit between the descriptions and data describing the objects. The problem is that one needs to identify and describe a subset of the data that forms a cluster in the presence of other data that should not belong to the cluster. We need to find a description of a cluster by identifying concentrations of objects that best fit a common description while reducing the of objects tly removed from the description that they should be identified with other clusters. Concavity does precisely this. We will see that the contribution to the total fit of a data point is greatest for the cluster description with which it is most compatible, while the influence of fit for other clusters is reduced or even ignored. For an individual cluster, we will use r(, t) as above to measure fit between a data point and a cluster descriptor t, such as location and shape. Three famous objective function clustering procedures are partitioning methods, miture analysis, and fuzzy clustering. Partitioning: For C 1,..., C k a partition of the data into clusters and descriptors t 1,..., t k, one for each cluster, i 6 C i r(, t i ) is the total fit between the data and the given cluster structure, so that one finds the C 1,..., C k and t 1,..., t k that minimize the badness-of-fit.

9 Concavity in Data Analysis 85 Miture analysis: This approach assumes that the data are a sample from a population whose probability density is a finite miture of densities of the form e r(,t i) and the structure is determined by finding t and p = (p 1,..., p k ) to maimize log i p i e r(,t i) where p i is the probability of belonging to the i-th subpopulation. In this case, t and p are maimum likelihood estimates. Redner and Walker (1984) survey the topic. Fuzzy clustering: This method is based on the notion that some concepts that appear to define sets really do not. For eample, who is in the set of tall people? There is no clear 879 in height that would universally be accepted as the dividing line between tall and not tall. A fuzzy set is a function that assigns to each object a number between zero and one that measures the degree to which the object has the property that the set represents. A fuzzy partition of R d is a vector of k such functions defined on R d, u = (u 1,..., u k ), with u i the membership function for the i-th cluster and restricted so that i u i() = 1 for each. A fuzzy clustering is obtained by finding u and t to minimize i u m i () r(, t i) where m > 1 is a tuning constant used to adjust the fuzziness of the clusters. The larger m is the fuzzier the clusters will be and the closer m is to one, the closer the results are to partitioning. Bezdek and Pal (1991) survey the literature on the subject. These three objective functions can be converted into compositions of the measures fit for each of the clusters given by r(, t) = (r(, t 1 ),..., r(, t k )) with concave functions, increasing in each variable. These functions are obtained for partitioning and fuzzy clustering, by replacing the clusters with the sets C 1,..., C k and membership functions u 1,..., u k that minimize the respective objective functions for given descriptors, namely C i = { : r(, t i ) = min j r(, t j )} and u i () = r(, t i ) 1/(1 m) / j r(, t j) 1/(1 m). The conversion for mitures just puts a minus sign in front to change maimizing to minimizing. The function r:i (, t, p) = r(, t i) log p i is used to measure fit for mitures. In fact, the same can be done for partitioning and fuzzy clustering, so that all

10 86 Michael P. Windham Table 2: Concave functions in cluster analysis. Method a A i Partitioning min i (r i ) 1 if r i = min j (r j ) 0 otherwise Miture log( i e r i ) e r i j e r j Fuzzy ( i r1/(1 m) i ) 1 m r 1/(1 m) i j r1/(1 m) j m a P 0 r 0 1 (a) r a a (b) F M Figure 2: Concavity in cluster analysis. three contain a parameter p = (p 1,..., p k ) satisfying i p i = 1 and describing how the data are apportioned among the clusters. The concave functions, a, are shown in Table 2. The resulting fit functions are Partitioning: min i r:i (, t, p) Miture: log i e r; i (,t,p) Fuzzy clustering: i r: 1 m i (, t, p)1/1 m A simplified illustration will show how concavity facilitates cluster analysis. In Figure 2(a) the line r 0 represents a measure of incompatibility with 0 in that the further is from 0 the greater the measure. Similarly, r 1 represents a measure of incompatibility with 1. Figure 2(b)

11 Concavity in Data Analysis 87 shows the associated functions from Table 2 applied to r 0 and r 1, namely, for partitioning a P = min(r 0, r 1 ), for mitures a M = log(e r 0 + e r 1 ) translated by a constant for the purpose of the illustration so that its graph goes through goes through 0 and 1, and for fuzzy clustering a F = (r0 1 + r1 1 ) 1, using m = 2. In terms of incompatibility, a P ignores incompatibility with 1 for closer to 0 than to 1 and ignores incompatibility with 0 for closer to 1 than to 0. The concavity ensures that near 0 the sense of incompatibility is preserved, that is incompatibility with 0 increases as you move away from 0 and similarly for near 1. The functions for miture and fuzzy clustering are simply smoother versions of the same phenomenon. One disadvantage to this approach is that the sense of cluster is lost. The parameters t i describe the clusters in terms of characteristics like location and shape, but what are the clusters as sets of objects? Fortunately, the notion of cluster membership is still present in the A i s. In particular we have the following interpretations. Partitioning: C i = { : r:i (, t, p) = min j r:j (, t, p)} = { : A i(r: (, t, p)) = 1} is the subset of the data most compatible with the i-th cluster description, that is C i is the i-th cluster. Mitures: A i (r: (, t, p)) = p i e r(,ti) / j p je r(,tj) is the estimate for the conditional probability of belonging to the i-th subpopulation knowing the data. Fuzzy clustering: The fuzzy membership function for the ith fuzzy cluster is u i () = A i (r: (, t, p)) 1 m = r:i (, t, p)1/(1 m) / j r:j (, t, p)1/(1 m). The clustering procedures I have discussed so far use continuously varying parameters to describe the clusters. Many procedures do not. For eample, the medoid method described in Kaufman and Rousseeuw (1990) attempts to identify objects among those under study to serve as descriptors for the location of the clusters, the medoids. The cluster associated to a given medoid consists of the objects the least dissimilar to it. The medoids are chosen to be the objects for which the sum of the dissimilarities to them of the objects associated to them is the least. We can formulate a badness-of-fit function whose minimum identifies

12 88 Michael P. Windham the medoids and their clusters as follows. Let {j 1,..., j k } be the indices of k distinct objects that are candidates for medoids, and d jl denote the dissimilarity between the j-th and l-th objects, then the badness-of-fit function is min min d ji l {j 1,...,j k } i l The concave function min( ) plays its usual role. In particular, min i ignores objects not associated to the i-th medoid and min {j1,...,j k } ignores objects not chosen to be medoids. Windham (1985) describes a fuzzy version of the same idea. The reader is encouraged to create a miture version as well. 6. Robust Cluster Analysis Robust cluster analysis is now quite straight forward to achieve. Simply replace r(, t i ) by h(r(, t i )). This approach was introduced for miture analysis in Windham (2000), but the concavity in both cluster analysis and M-estimation makes it possible to use the same idea in partitioning and fuzzy clustering. In particular, for any function a from Table 2, the badness-of-fit function becomes a(r(, t 1 ) log p 1,..., r(, t k ) log p k ) = a(r: (, t, p)) a(h(r(, t 1 )) log p 1,..., h(r(, t k )) log p k ) = a(< : (, t, p)) Figure 3 shows the impact of robustizing in an eample presented in Windham (2000). Figure 3(a) shows the results of finding three clusters using a miture clustering function with a Cauchy measure of fit, based on g(s) = log(1+s). The confusion caused by the presence of the outliers is apparent. Figure 3(b) shows the results obtained with the miture clustering, Cauchy fit, and the Welsch robustizer with tuning constant The robustized result clearly identifies the structure of the bulk of the data without being unduly influenced by the outliers. 7. Minimization Procedures The role that concavity plays in constructing minimizing algorithms has been recognized for some time. A comprehensive discussion is given in Heiser (1995) where it is viewed as a special case of iterative majorization. It is also described in Lange, Hunter and Yang (2000), which

13 Concavity in Data Analysis 89 (a) (b) Figure 3: Cluster analysis with outliers. includes the algorithm for finding the median as an eample. Both of these articles provide a wealth of references to many other applications of the idea in constructing algorithms. The iterative majorization with concave functions is based on the following consequence of concavity. From (1), one can see that if you are at r 0 then you can decrease f(r) by decreasing F(r 0 )r. In particular, if F(r 0 )r F(r 0 )r 0, (3) then f(r) F(r 0 )(r r 0 ) + f(r 0 ) f(r 0 ). Iteratively applying (3) will produce a decreasing sequence of values of f. If f is bounded below, the sequence of function values, f(r), will converge. Finally, if f is also increasing in each variable, then F i (r 0 ) 0, so F(r 0 )r can be decreased by simply decreasing each variable, r i, and decreasing f(r) has then been reduced to decreasing a weighted version of r. Decreasing r may not be = 3> In fact, in applications r is often a function of some parameter important to the contet and we are seeking the value of the parameter that minimizes f(r). The function F(r 0 )r is often, at least locally, a conve function of that parameter and can be easily minimized. The articles mentioned above also contain detailed discussions of the convergence properties of the algorithms. I mentioned that when the objective function is bounded below a decreasing sequence of iterates of the function itself will converge. More important is the question of whether the corresponding sequence of parameters converges. In fact,

14 90 Michael P. Windham there is no guarantee that they do converge and even when they do they may not converge to minimizer. These articles along with Bezdek, Hathaway, Howard, Wilson and Windham (1987) and Windham (1987) discuss this problem. The following describes the iterative procedures for robust estimation, cluster analysis, and finally robust cluster analysis. Each of them is the implementation of (3). The function F becomes a function H, such as the derivative of an h from Table 1 in robust estimation, and A = (A 1,..., A k ) from Table 2 in cluster analysis. Robust Estimation: Robust estimators can be computed from the iterative process based on (3) simply by using the appropriate choice of H, the derivative of a concave, increasing function, in (4) below. The iterations proceed from the current value of t, t c, to net, t +. We will use arg min l(t) to denote the value of t that minimizes l(t). Beginning t with the initial t c to be the value of t that minimizes r(, t), the iteration is t + = arg min H(r(, t c ))r(, t) (4) t The algorithm reduces the problem of minimizing the robustized criterion to iteratively minimizing a weighted version of the unrobustized function, a problem that is usually easy to solve. For eample, if t = (m, S) and r(, t) = ( m) T G(S)( m) + tr(g(s) G(S)S) as in Section 3, then the parameter updates are as follows. Letting w c () = H(r(, t c ))/ H(r(, tc )), we have m + = S + = w c () w c ()( m + )( m + ) T. The value w c () simply weights the contribution of in accordance with the of the concave function h used for robustness. Clustering: The algorithm for minimizing the objective functions iterates from t c, p c to t +, p + by t + i = arg min t p + i = arg ma p i = A i (r: (, t c, p c ))r(, t) A i (r: (, t +, p c )) log p i A i(r: (, t +, p c )) j A j(r: (, t +, p c )).

15 Concavity in Data Analysis 91 In the case where the parameters are location and scale, that is t i = (m i, S i ), the updates for those parameters are m + i = S + i = w c i () w c i ()( m + )( m + ) T where wi c() = A i(r: (, t c, p c ))/ A i(r: (, t c, p c )) weights the contribution of to the net parameter estimates for the i-th cluster according to its membership in the i-th cluster as described by the current parameter estimates. Robust Clustering: t + i = arg min t p + i = A i (< : (, t c, p c ))H(r(, t c i ))r(, t) A i(< : (, t +, p c )) j A j(< : (, t +, p c )) When the parameters are location and scale, the updates for those parameters are where the weight m + i = S + i = w c i () w c i ()( m+ )( m + ) T w c i () = A i (< : (, t c, p c ))H(r(, t c i ))/ A i (< : (, t c, p c ))H(r(, t c i )) has two terms, one for clustering and the other for robustness. 8. Conclusion The influence of etreme values on parameter estimation in data analysis can be controlled with concave functions without seriously complicating the problem to be solved. This paper illustrates this fact with three eamples where t things need to be controlled. In the first, etreme values of scale were controlled so that well-behaved measures of fit could be built that would have the covariance or scatter matri as

16 92 Michael P. Windham a minimizer. In the M-estimation eample, the influence of outliers on parameter estimates was controlled with concave functions. Finally, in finding a description of a cluster in a cluster analysis, the influence of data from other clusters is reduced by combining fits to cluster descriptions with concave functions. Moreover, the common thread of applying concave functions to control etreme values allows one to combine all three to build a procedure for robust cluster analysis, that estimates location and scale within a cluster. References BEZDEK, J.C., HATHAWAY, R.J., HOWARD, R.E., WILSON, C.A. and WIND- HAM, M.P. (1987), Local convergence analysis of a grouped variable version of coordinate descent, Journal of Optimization Theory and Applications, 54, BEZDEK, J.C. and PAL, S.K. (1991), Fuzzy Models for Pattern Recognition, New York: IEEE Press. HAMPEL, F.R., RONCHETTI, E.M., ROUSSEEUW, P.J. and STAHEL, W.A. (1986), Robust Statistics: The Approach Based on Influence Functions, New York: Wiley. HEISER, W.J. (1995), Convergent computation by iterative majorization: theory and applications in multidimensional data analysis, Recent Advances in Descriptive Multivariate Analysis, Ed. W. Krzanowski, Oford: Clarendon Press, HOLLAND, P.W. and WELSCH, R.E. (1977), Robust regression using iteratively reweighted least-squares, Communications in Statistics Theory and Methods, A6(9), HUBER, P.J. (1981), Robust Statistics, New York: Wiley. KAUFMAN, L. and Rousseeuw, P. (1990), Finding Groups in Data: An Introduction to Cluster Analysis, New York: Wiley. LANGE, K., HUNTER, D. and YANG, I. (2000), Optimization transfer using surrogate objective functions, Journal of Computational and Graphical Statistics, 9, REDNER, R.A. and WALKER, H.F. (1984), Miture densities, maimum likelihood and the EM algorithm, SIAM Review, 26, VERBOON, P. and HEISER, W.J. (1992), Resistant orthogonal Procrustes analysis, Journal of Classification, 9, WINDHAM, M.P. (1985), Numerical classification of proimity data with assignment measures, Journal of Classification, 2, WINDHAM, M.P. (1987), Parameter modification for clustering criteria, Journal of Classification, 4, WINDHAM, M.P. (2000), Robust clustering, Data Analysis: Scientific Modeling and Practical Application, Eds. W. Gaul, O. Opitz and M. Schader, Berlin: Springer,

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

k is a product of elementary matrices.

k is a product of elementary matrices. Mathematics, Spring Lecture (Wilson) Final Eam May, ANSWERS Problem (5 points) (a) There are three kinds of elementary row operations and associated elementary matrices. Describe what each kind of operation

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25 Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs

More information

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

EECS 545 Project Progress Report Sparse Kernel Density Estimates B.Gopalakrishnan, G.Bellala, G.Devadas, K.Sricharan

EECS 545 Project Progress Report Sparse Kernel Density Estimates B.Gopalakrishnan, G.Bellala, G.Devadas, K.Sricharan EECS 545 Project Progress Report Sparse Kernel Density Estimates B.Gopalakrishnan, G.Bellala, G.Devadas, K.Sricharan Introduction Density estimation forms the backbone for numerous machine learning algorithms

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013 PAUL SCHRIMPF OCTOBER 24, 213 UNIVERSITY OF BRITISH COLUMBIA ECONOMICS 26 Today s lecture is about unconstrained optimization. If you re following along in the syllabus, you ll notice that we ve skipped

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Parameter estimation by anfis where dependent variable has outlier

Parameter estimation by anfis where dependent variable has outlier Hacettepe Journal of Mathematics and Statistics Volume 43 (2) (2014), 309 322 Parameter estimation by anfis where dependent variable has outlier Türkan Erbay Dalkılıç a, Kamile Şanlı Kula b, and Ayşen

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Basic Linear Inverse Method Theory - DRAFT NOTES

Basic Linear Inverse Method Theory - DRAFT NOTES Basic Linear Inverse Method Theory - DRAFT NOTES Peter P. Jones 1 1 Centre for Compleity Science, University of Warwick, Coventry CV4 7AL, UK (Dated: 21st June 2012) BASIC LINEAR INVERSE METHOD THEORY

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Asymptotes are additional pieces of information essential for curve sketching.

Asymptotes are additional pieces of information essential for curve sketching. Mathematics 00a Summary Notes page 57 4. Curve Sketching Asymptotes are additional pieces of information essential for curve sketching. Vertical Asymptotes The line a is a vertical asymptote of the graph

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

2 Generating Functions

2 Generating Functions 2 Generating Functions In this part of the course, we re going to introduce algebraic methods for counting and proving combinatorial identities. This is often greatly advantageous over the method of finding

More information

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES JAN DE LEEUW 1. INTRODUCTION In homogeneity analysis the data are a system of m subsets of a finite set of n objects. The purpose of the technique

More information

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form:

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form: 3.3 Gradient Vector and Jacobian Matri 3 3.3 Gradient Vector and Jacobian Matri Overview: Differentiable functions have a local linear approimation. Near a given point, local changes are determined by

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Type II variational methods in Bayesian estimation

Type II variational methods in Bayesian estimation Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093

More information

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag. San Francisco State University Math Review Notes Michael Bar Sets A set is any collection of elements Eamples: a A {,,4,6,8,} - the set of even numbers between zero and b B { red, white, bule} - the set

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes. Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

0.1. Linear transformations

0.1. Linear transformations Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

GETTING STARTED INITIALIZATION

GETTING STARTED INITIALIZATION GETTING STARTED INITIALIZATION 1. Introduction Linear programs come in many different forms. Traditionally, one develops the theory for a few special formats. These formats are equivalent to one another

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Intro to Nonlinear Optimization

Intro to Nonlinear Optimization Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Lecture Eigenvalue Problem Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Eigenvalue Eigenvalue ( A I) If det( A I) (trivial solution) To obtain

More information

Lecture 7: Weak Duality

Lecture 7: Weak Duality EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

6.034f Neural Net Notes October 28, 2010

6.034f Neural Net Notes October 28, 2010 6.034f Neural Net Notes October 28, 2010 These notes are a supplement to material presented in lecture. I lay out the mathematics more prettily and etend the analysis to handle multiple-neurons per layer.

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Chiang/Wainwright: Fundamental Methods of Mathematical Economics

Chiang/Wainwright: Fundamental Methods of Mathematical Economics Chiang/Wainwright: Fundamental Methods of Mathematical Economics CHAPTER 9 EXERCISE 9.. Find the stationary values of the following (check whether they are relative maima or minima or inflection points),

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Lasso-based linear regression for interval-valued data

Lasso-based linear regression for interval-valued data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS065) p.5576 Lasso-based linear regression for interval-valued data Giordani, Paolo Sapienza University of Rome, Department

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

ANÁLISE DOS DADOS. Daniela Barreiro Claro

ANÁLISE DOS DADOS. Daniela Barreiro Claro ANÁLISE DOS DADOS Daniela Barreiro Claro Outline Data types Graphical Analysis Proimity measures Prof. Daniela Barreiro Claro Types of Data Sets Record Ordered Relational records Video data: sequence of

More information

Economics 205 Exercises

Economics 205 Exercises Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test.

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test. Chapter 4: Applications of Differentiation In this chapter we will cover: 41 Maimum and minimum values The critical points method for finding etrema 43 How derivatives affect the shape of a graph The first

More information

Students were asked what is the slope of the line tangent to the graph of f at. and justify the answer.

Students were asked what is the slope of the line tangent to the graph of f at. and justify the answer. 18 AB5 Marshall Ransom, Georgia Southern University Problem Overview: Let f be the function defined by f ( ) e cos( ). Part a: Students were asked to find the average rate of change of f on the interval.

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Chapter 6 Overview: Applications of Derivatives

Chapter 6 Overview: Applications of Derivatives Chapter 6 Overview: Applications of Derivatives There are two main contets for derivatives: graphing and motion. In this chapter, we will consider the graphical applications of the derivative. Much of

More information

Quadratic Functions: Global Analysis

Quadratic Functions: Global Analysis Chapter 12 Quadratic Functions: Global Analysis The Essential Question, 191 Concavity-sign, 192 Slope-sign, 194 Etremum, 194 Height-sign, 195 Bounded Graph, 197 0-Concavity Location, 199 0-Slope Location,

More information

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

UNIVERSIDAD CARLOS III DE MADRID MATHEMATICS II EXERCISES (SOLUTIONS )

UNIVERSIDAD CARLOS III DE MADRID MATHEMATICS II EXERCISES (SOLUTIONS ) UNIVERSIDAD CARLOS III DE MADRID MATHEMATICS II EXERCISES (SOLUTIONS ) CHAPTER : Limits and continuit of functions in R n. -. Sketch the following subsets of R. Sketch their boundar and the interior. Stud

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Course Notes: Week 4

Course Notes: Week 4 Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived

More information

Calculus of Variation An Introduction To Isoperimetric Problems

Calculus of Variation An Introduction To Isoperimetric Problems Calculus of Variation An Introduction To Isoperimetric Problems Kevin Wang The University of Sydney SSP Working Seminars, MATH2916 May 4, 2013 Contents I Lagrange Multipliers 2 1 Single Constraint Lagrange

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information