An Analysis of Locally Defined Principal Curves and Surfaces

Size: px
Start display at page:

Download "An Analysis of Locally Defined Principal Curves and Surfaces"

Transcription

1 An Analysis of Locally Defined Principal Curves and Surfaces James McQueen Department of Statistics, University of Wasington Seattle, WA, 98195, USA Abstract Principal curves are generally defined as smoot curves passing troug te middle of te data and ave wide applications in macine learning for example in dimensionality reduction and feature extraction. Recently Umut Ozertem and Deniz Erdogmus (O&E) provide us wit a novel approac for defining principal curves and surfaces in teir paper Locally Defined Principal Curves and Surfaces (2011). Tis report aims to reproduce te results of teir paper and provide critical assessment of its performance, flaws, and merits. 1 Introduction Peraps te most popular dimension reduction tool used today is known as principal components analysis (PCA). PCA is an ortogonal linear transformation wic transforms (rotates) te data into a new coordinate system suc tat te greatest variance of te data projected on eac of te new coordinates is associated wit te first coordinate (first principal component), te second greatest variance wit te second coordinate, etc. Tis is a commonly used tool in dimension reduction as we can project te original data onto te subspace spanned by te first d principal components, tus preserving most of te variance of te data but reducing te dimension. As we are projecting onto a linear space tis can be describe as linear dimension reduction. Inerent to tis metod is tat te first principal component is a line, te first and 1

2 second principal components create a plane, etc., and we rank tese lines according to te amount of variance explained by eac. Te success of PCA as a linear dimension reduction tecnique begs te question: can we extend PCA to a non-linear set of principal surfaces tat retain some of te desirable properties of principal lines? 2 Callenges Tere are a number of callenges in non-linear dimension reduction in general, additionally, tere is no agreed upon definition of a principal curve and terefore no agreed metod of estimating tem. Popular non-linear dimension reduction (or manifold learning) tecniques suc as Isomap (Tenenbaum et al. [2000]), local linear embedding (Roweis and Saul [2000]), Laplacian eigenmaps (Belkin and Niyogi [2003]), and maximum variance unfolding (Weinberger and Saul [2006]) rely on generating locality information of data samples from a data proximity grap. Tese tecniques, owever, depend on careful tuning of parameters controlling grap structure as te accuracy of tese metods depends on te quality of te grap. Furtermore, many of tese tecniques assume tat te data truly lie on a manifold of inerent dimension d and try to recover te underlying manifold. Tese metods rely on te validity of te assumption. Tere is currently no agreed upon definition of principal curves as tey do not come about so naturally as principal components (lines). Principal curves are generally understood to be smoot curves passing troug te middle of te data, owever, a more matematically precise definition is required. Many definitions involve taking a property of te principal line and ten try to find a smoot curve tat fits tese properties: Hastie and Stuetzle [1989] require tem to be self-consistent wereas Delicado [1998] restricts teir total variance and conditional means. After requiring tat a principal curve satisfies some constraint, one ten devises an algoritm tat finds te principal curve tat meets te requirement and minimizes 2

3 a criterion (suc as mean-squared projection errror). Terefore, tese metods aim to find a curve tat best fits te data. Tis as two primary flaws. First, trying to minimize some data-weigted criterion leads to overfitting and so regularization is almost certainly required. A more pilosopical issue is tat principal curves ougt to be tougt of as inerent structures of te data generating mecanism tat ave to be approximated as opposed to defined to be te solutions to an algoritm. 3 Metod Ozertem and Erdogmus (encefort referred to as O&E) take a novel approac to principal curves by defining tem to be inerent structures of te underlying probability distribution of te data. Tey consider a principal surface defined suc tat every point on te principal surface is a local maximum (local mode) of te probability density in te local ortogonal subspace. In particular tis definition implies tat te principal curve (surface of dimension 1) is te ridge of te probability distribution a natural result. Defining principal curves as structures of probability density functions leads to a differential geometric definition depending on te gradient and Hessian of probability density function. Let p(x) be te density function of x R n, let g( ) be te gradient of p( ) and H( ) be te Hessian. Let (λ i (x), q i (x)) be te it eigenvalue, eigenvector pair of H(x). Let C d be te set points x suc tat tere exists a set I {1,..., n} wit I = n d suc tat g(x) T q i (X) = 0 i I. We say x is a regular point of C d if te set I is unique, tat is, g(x) is perpendicular to exactly (n d) eigenvectors. Any regular point x of C d is perpendicular to n d ortogonal eigenvectors and terefore all suc points must lie on a surface wit intrinsic dimension d. Ten define P d te principal surface of dimension d to be te set of regular points of C d suc tat λ i (x) < 0 i I, tat is, P d contains te local maxima of te ortogonal subsbace C d (x) = span{q i(x) i I } (as te gradient at tese points projected 3

4 onto C d (x) is by definition zero). To see wy tis definition is natural, consider P 0. Tis is te set of points tat are ortogonal to n 0 = n eigenvectors (tat is, all of tem) terefore g(x) = 0 so tey are critical points, but additionally tey are te critical points associated wit negative eigenvalues and are tus local maxima. Hence P 0 is te set of local modes, P 1 is te principal curve (surface of dimension 1) and defines te ridge of te density function, i.e. tere is only one direction of increase. Tis is a satisfying definition as it bot defines principal curves as inerent structures of te data generating mecanism and it naturally extends from principal curves (of dimension 1) to principal surfaces of arbitrary dimension d > 1. Additionally, tese surfaces exist wenever te density function permits a gradient and Hessian. In practice Kernel Density Estimation or parametric density estimation (e.g. mixture models) are used to estimate p and tey are composed from densities wit at least second order derivatives. 3.1 Existence and Consistency of Principal Curves Since tis definition of principal curves depends on first and second derivatives of te density as long as tese exist suc tat te essian is non-zero ten te principal curve exist. Tese conditions are mild and since in practice kernel densities are used tese guarantee te gradient and Hessian exist and are continuous and so te principal surfaces will exist. Cacon et al. [2011] sow tat under te assumption tat te kernel bandwidt matrix converges to zero fast enoug, tat te underlying density and kernel ave a sufficient number of continuous square-integrable derivatives, tat te kernel as finite covariance ten te integrated meansquared-error between te vector of order-r derivatives of te KDE converge to tose of te true density. Terefore for a sufficiently smoot kernel and density, te derivatives of te KDE are consistent. Consequently, since te principal surfaces are defined by te first and second derivatives tey too must be consistent. 4

5 4 Algoritm O&E present an adjustment to te mean sift algoritm tat tey claim will converge to te principal surface of dimension d i.e., P d. In te algoritm we initialize wit eiter a mes of points or te data points temselves (te latter ensures te resulting principal surface will be in te support of te data and is also te projection of te data onto te surface). 4.1 Monotonically Increasing functions and Local Covariance Instead of involving te Hessian H(x) in teir subspace constrained mean-sift algoritm, te autors define a new matrix called te Local Covariance. In order to motivate teir definition of Local Covariance, in teir paper te autors sow tat: Lemma 4.1 For strictly increasing, twice differentiable functions f te principal set P d of a density p(x) is te same as te principal set P d of te transformed density f(p(x)). Let x P d wit pdf p(x), gradient g(x) and essian H(x). Let H(x) = QΛQ be te eigendecomposition. Since x is a a point in P d its gradient g(x) is ortogonal to all eigenvectors q i (x) in te set I and wose span is te ortogonal space. Let Q be te matrix wose columns are composed of tese eigenvectors, consequently let Q composed of te remaining eigenvectors tat span te parallel space. Ten in tis case we may write H(x) = Q Λ Q T +Q Λ Q T were te Λ s are te corresponding eigenvalues. Since g(x) is ortogonal to te vectors in Q by definition, it must be in te parallel space: g(x) = Q β for some weigt vector β. We ten calculate tat gradient and essian of te transformed pdf f(p(x)). g f (X) = f (p(x))g(x) = f (p(x))q β = Q (f (p(x))β) Q β 5

6 Terefore te gradient is also in te parallel space and x Cf d as well. H f (x) = f (p(x))h(f(x)) + f (p(x))g(x)g(x) T = f (p(x)) [ Q λ Q T + Q Λ Q ] + f (p(x))q ββ T Q T = { f (p(x))q λ Q T + f (p(x))q ββ T Q T } + f (p(x))q Λ Q Ten since f (x) > 0 for all x te sign eigenvalues of te ortogonal space do not cange and as suc x Pf d, as required. Consider te special case of f = log(x) and p(x) is te Gaussian density, we ave te property tat: H(x) = ( 1/2)Σ 1 Were Σ is te covariance matrix for te Gaussian distribution. Tis implies tat wen te underlying density is assumed to be Gaussian te principal curve definition coincides wit principal components. Tis leads O&E to define a local covariance for any distribution p(x) based on te above. Tis also gives us an ordering of te eigenvectors (as in PCA) suc tat we select te n d eigenvectors associated wit te n d largest eigen-values of te local covariance: Σ 1 (x) 2H log (X) = H(X) p(x) + g(x)g(x)t p(x) 2 By Lemma 4.1 te principal surface defined by using te local covariance is identical to te principal surface defined by te Hessian. In fact, te eigenvalues of te Hessian are just p(x) times te eigenvalues of te local covariance tus we equally take te eigenvectors of H(x) associated wit te n d smallest eigenvalues of te Hessian. In Gassabe et al. [2013] tey point out tat wile te motivation for te use of te so-called local inverse covariance is due to te relationsip to principal components wen te underlying density is Gaussian, in practice te density used will never be Gaussian and so tey argue tat direct 6

7 use of te Hessian or oter estimates of te local covariance can be used wit impunity. In fact, tey prove convergence of te algoritm in all of tese cases (tat is, convergence in a finite number of steps not necessarily to te principal surface). Simulations in teir paper also indicate significant computational-savings using two local-covariance estimates defined by Wang and Carreira-Perpiñán [2010] wit no worse performance (in terms of mean square deviation from te underlying generative spiral) wen compared to use of te local covariance as defined by O&E. 4.2 Mean-sift In order to define an algoritm tat converges to te principal surface of dimension d: P d te autors adjust te well-known Mean-Sift algoritm, tus we briefly review wat te mean-sift algoritm does and wy it makes sense to adjust it for tis purpose. Te mean-sift algoritm is a general-purpose algoritm for finding local-modes in data. It is often used (Comaniciu and Meer [2002]) for clustering. It is a non-parametric metod tat assumes an underlying kernel density estimate. As we will be using te Gaussian Kernel trougout te paper we will specialize to tis case. If we define te k(x) be te Gaussian profile : k(x) = exp ( 1 2 x) (different from te Kernel in tat te squaring operation is done before passing it to te function k). Ten we define te kernel density estimate of p(x t ) te underlying distribution of x t R n based on N i.i.d samples x i p( ) as: ˆp(x t ) 1 ( ) n/2 N ( 1 k N n 2π i=1 x t x i A local mode is a local maximum in te density function, tus, to find a local maximum we take te gradient of te density function wit respect to te point of interest x t and set it 2 ) 7

8 to zero: ( ) n/2 N ( 2 1 ˆp(x) g(x) = (x x N n+2 i )k x t x i 2π i=1 ( ) n/2 N ( 1 1 = (x N n+2 i x)k x t x i ) 2 2π i=1 ( ) [ n/2 N ( 1 1 = k x t x i )] ( N 2 i=1 k x t x i 2) x i N n+2 2π ( N i=1 i=1 k x t x i 2) 2 ) xt Te step in te second line follows as k (x) = 1 k(x) for te special case of Gaussian profile. 2 Te quantity in square brackets: N i=1 k ( x t x i N i=1 k ( x t x i 2) x i 2) xt m(x) Is called te mean-sift and we iteratively set te gradient to zero by setting: x t+1 = x t + m(x t ) A so-called mean-sift update. Wen tis becomes a fixed point we ave found a local mode. 4.3 Deriving te Gradient and Hessian As p(x) is generally unknown, O&E assume an underlying Kernel Density Estimate. Te definition applies for any estimate p(x), owever, presently we will consider fixed bandwidt Kernel Density Estimators wit Gaussian Kernel. Since we are using te same KDE as in 8

9 te Mean-sift te gradient is te same: ˆp(x) g(x) = ( ) [ n/2 N ( 1 1 k N n+2 2π i=1 x t x i )] ( N 2 i=1 k x t x i 2) x i ( N i=1 k x t x i 2) xt We can also simplify te gradient to: g(x) = p(xt ) 2 [ m(x t ) x t] Taking te second derivative we get te Hessian: ( ) n/2 N { ( H(x t 1 1 ) = k x t x i ) 2 [ ] } 1 N n+2 2π 2 (xt x i )(x x i ) T I n i=1 ( ) n/2 N ( 1 1 = k x t x i ) ( 2 N i=1 N n+4 2π i=1 k x t x i 2) (x t x i )(x x i ) T ( N i=1 k x t x i 2) ( N = 1 i=1 k x t x i 2) (x t x i )(x t x i ) T 4 p(xt ) ( N i=1 k x t x i 2) 2 I n p(xt ) 4 { v(x t ) 2 I n } 2 I n Ten we are trying to find points x t wose gradient is ortogonal to exactly n d eigenvectors of te Hessian, tat is, we are looking for local modes in te ortogonal subspace as defined in 3. Tis leads to an adjustment to te Mean-sift algoritm tat te autors name te subspace-constrained mean-sift algoritm. It is similar to a projected gradient (Goldstein [1964] and Levitin and Polyak [1966]) version of te mean sift, were te mean sift update m(x) is projected into te local ortogonal space before being used to update te trajectory x t. 9

10 4.4 Subspace Constrained Mean-sift Given te above idea of constraining te mean-sift update into te ortogonal subspace, te autors adjust te mean-sift algoritm to create te subspace constrained mean-sift algoritm. Subspace Constrained Mean Sift (SCMS) for Gaussian KDE Input: denisty estimate p(x), desired dimension d, tolerance ɛ > 0. Initialize: Trajectories x 0 1,..., x 0 K to a mes or data points. for k = 1 to k = K do wile not converged do 1. m(x t k ) 2. g(x t k ) p(xt k ) 2 3. v(x t k ) N i=1 k N i=1 k ( x t x i N i=1 k ( x t x i 2) x i 2) x t evaluate mean-sift [m(x t k ) xt k ] evaluate gradient ( xt x ) 2 i (x t x i )(x t x i ) T ( N i=1 k x t x i 2) 4. H(x t k ) p(xt ) {v(x t ) 2 I 4 n } evaluate Hessian 5. Σ 1 (x t k ) 1 p(x t k )H(xt k ) + 1 g(x t p(x t k )2 k )g(xt k )T evaluate local covariance 6. perform te eigendecomposition: Σ 1 (x t k ) = VΛV 7. V [v 1,... v n ] eigenvectors wit te (n d) largest eigenvalues of Σ 1 (x t k ). 8. ˆm(x t k ) V V T m(xt k ) project m(xt k ) onto te ortogonal sub-space 9. x t k ˆm(xt k ) + xt k projected/subspace constrained mean-sift update if g T (x t k )VT g(xt k ) /( g(xt k ) VT g(xt k ) ) < ɛ ten declare converged else x t+1 k end if end wile end for x t k Note tat eac trajectory k can be run individually witout knowledge of te oters (tus te for loop can be run in parallel). Te parallelization of te algoritm will decrease computation time, owever, te procedure is still inerently iterative in eac trajectory and requires evaluating te kernel density as well as an eigendecomposition at eac step. Te algoritm is O(N 2 n 3 ) were N is te number of data points and n is te dimension of 10

11 te data. Tus, even wen run in parallel, for large data sets (especially of large dimension) tis algoritm can be slow. 4.5 On te convergence of SCMS It sould be noted tat te autors claim convergence of te SCMS algoritm by relation to te convergence of te Mean-sift algoritm proposed in Comaniciu and Meer [2002], owever, Li et al. [2007] pointed out a fundamental mistake in te proof of te MS algoritm in Comaniciu and Meer [2002], tus tere are no proofs of te optimality of te algoritm and weter or not it converges to te principal curve/surface. Recently, Gassabe et al. [2013] investigated te convergence properties of te SCMS algoritm. It was sown in Carreira-Perpiñán [2007] tat if a Gaussian profile is used ten te MS algoritm reduces to an EM algoritm and tus converges, owever, use of oter profiles do not guarantee convergence. In Gassabe et al. [2013] tey point out tat even if te MS converges it is not obvious tat tis implies convergence of te SCMS let alone to te desired principal surface. Tey do sow, owever, tat te algoritm will converge (i.e. it will end) in a finite number of steps toug not necessarily to te correct surface. 5 Experiments In order to examine te principal curve metod proposed by O&E we perform a number of experiments. In te first two sections we compare te O&E s princpal curve metod to te original Hastie & Stuetzle principal curve metod as well as te metod proposed by Kegl [1999]. In te tird section we display te robustness of O&E s principal curve metod and SCMS algoritm to andle more complicated data sets witout canges. Finally, we perform a simulation study to compare te principal curve and wavelet denoising metods. 11

12 5.1 Standard Principal Curve Data Sets In tis section we examine ow te principal curve algoritm defined by O&E performs on some standard principal curve data sets. Tis is primarily to ensure tat te metods ave been replicated accurately and te results sould look similar to te original paper. We compare tis metod to te metods proposed by Hastie & Stuetzle and Kegl Zig-Zag Data set Figure 1: Te Zig-Zag data set is plotted in ( ), Hastie & Stuetzle s principal curve curve in yellow, and Kegl s Polygonal Line in blue. 1 Figure 1 plots te Hastie and Stuetzle and Kegl s polygonal line algoritm wit te zig-zag data set 2. Figure 2 plots te principal curve for a variety of values of te bandwidt parameter. Once a bandwidt is selected te pricipal curve is found using te SCMS algoritm presented in 4.4. Te algoritm is initialized on te original data points suc tat te resulting curve is te projection of te data onto te principal curve as defined by O& E. Te bandwidt parameters were cosen to display te importance of appropriate selection of te bandwidt as te results can vary eavily from small canges. 1 Computed using Kegl s Java application code: 2 Data set provided by Kegl 12

13 Figure 2: Zig-zag data set ( ), O&E Principal Curve (blue), an intensity map of te associated KDE estimate is in green indicating te different curves for different bandwidts Spiral Data Set Here we take te spiral data set and compare te tree metods of finding principal curves. Figure 3 plots te Hastie & Stuetzle line as well as te Polygonal Line. Figure 4 plots te principal curve solution for different values of te bandwidt parameter. It sould be noted tat te bandwidt tat performs well on one data set (e.g. zig-zag data) does not necessarily perform well on anoter data set (e.g. spiral data). Terefore, in practice it is advisable to use a data-dependent kernel bandwidt, for example selecting by leave-one out maximum likeliood as we will below. Figure 3: Te Spiral data set is plotted in ( ), Hastie & Stuetzle s principal curve in yellow, as well as Kegl s Polygonal Line in blue 13

14 Figure 4: Spiral data set ( ), O&E Principal Curve (blue), an intensity map of te associated KDE estimate is in green indicating te different curves for different bandwidts. 5.2 Oter Data Sets In addition to te standard principal curve data sets, te principal curve metod defined by O&E can andle arbitrarily complicated data sets including tose wit self-intersections, bifrucations and loops. Tese data sets do not require alteration of te algoritm. Bot of tese are improvements over existing principal curve metods. Tis data set as many self-loops and bifurcations but can be andled by te SCMS algoritm wit no additional canges. Figure 5 sows te resulting principal curve on two complicated data sets wit self-intersections. Te first is a star sape and te second is a epitrocoid. Figure 5: Te underlying data are and te resulting O&E principal curve is plotted in blue. Te left is a star and te rigt is an epitrocoid. Tese plots display te ability of principal curves as defined by te autors to andle complex data sets 14

15 5.3 Signal Denoising Te autors claim tat te principal curve metod can be applied to solve te problem of denoising a signal. In Ozertem et al. [2008] tey apply teir metod to piece-wise linear functions tat ave been corrupted by wite were tey acieve some level of success but do not compare to oter more robust denoising metods. In tis case we will consider a deterministic one-dimensional time signal D wic as been corrupted by some form of i.i.d mean-zero noise ɛ. In tis case we will let ɛ be Gaussian wite noise. We define te signal D deterministically as a function of sinusoids. Te goal of any denoising metod is, given a corrupted signal X = D+ɛ to estimate D. Naturally, as variability of te noise increases tis task becomes more callenging. Tis leads to defining a notion of Signal-To-Noise-Ratio: SNR = D 2 E[ ɛ 2 ] It is common in te signal processing literature to write te SNR in terms of decibels. SNR (in db) = 10 log 10 (SNR). As suc we will follow tis standard Principal Curve Denoising Here we follow Ozertem et al. [2008] in teir paper on applying principal curve metods to denoise piece-wise linear signals. Tat is, we use te Gaussian kernel wit single bandwidt parameter tat we will select by leave-one-out maximum likeliood cross-validation as in Leiva-Murillo and Rodríguez [2012] Tat is, we select te bandwidt tat maximizes P r(x i x i ) over te entire data set were x i is te data set excluding te point i. Te estimated true signal D will be P 1 = ˆD i.e. te principal curve for te data set under te Gaussian KDE. 15

16 5.3.2 Wavelet Denoising As muc of te details in wavelet denoising teory are beyond te scope of tis paper, our discussion will be brief. Te Discrete Wavelet Transformation (DWT) is an ortonormal transformation, suc tat, for an ortonormal matrix W (defined by a coice of wavelet filter) we define te wavelet coefficients to be: W = WX. Since W is an ortonormal transformation tis representation W bot preserves energy (i.e. W 2 = X 2 but also is an exact (alternate) representation of te signal in tat we can invert te transformation to recover te data set: X = W 1 W = W T W. Were te last step follows from te ortonormality of W. Te metod of Wavelet denoising comes from te more general ortonormal transformation denoising. Te metodology is simple: 1. Given a signal X and a coice of wavelet filter we calculate te wavelet coefficients W 2. Given a tresold δ set to zero all wavelet coefficients W t suc tat W t < δ 3. Given tresolded coefficients W (T ) calculate te inverse transformation to arrive at new signal X (T ) Ten we take X (T ) to be our estimate of D. Based on te work of Donoo and Jonstone [1994] we use te universal tresold wic is derived from assuming Guassian wite noise (as in tis case): δ (U) = 2σ 2 e log(n). Since te variance of te noise is (typically) unknown we use te Median Absolute Deviation (MAD) estimate σ 2 (MAD) = median{w 1,1,...,W 1,N/2 } wic under certain assumptions (Donoo and Jonstone [1994]) is an unbiased estimate of σ 2 e Results We fix a sample size N and generate corrupted signal X = D + ɛ varying te SNR. We ten apply bot wavelet and principal curve metods to estimate D. We evaluate teir 16

17 performance based on Mean-Squared-Error from te true signal D. We ten repeat tis procedure 100 times for eac sample size N and eac SNR to provide Monte-Carlo standard deviation bounds. In figure 6 we consider small sample sizes N = 32 and N = 64. We plot log MSE to make te plots easier to read. In bot cases ere we see tat in terms of meansquared error for any signal to noise ratio tat te principal curve metod is out-performing te wavelet metod. Figure 6: Te top on bot sides plots te underlying signal (uncorrupted) for data sizes 32 and 64. Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. Wat we see in figures 7 and 8, owever, is a cange as we increase te sample size. For larger data sets te more teoretically well-founded wavelet denoising metod vastly outperforms te principal curve metod wic appears to stagnate around -8 log mean-squared error regardless of sample size. Tere migt be a variety of reasons for tis. First, tere exists a great deal of teory on wavelet denoising (and ortonormal transforms in general), wereas tere is none for te principal curves. In particular, te tresold δ was found assuming an underlying Gaussian wite noise process but te bandwidt selection metod for te principal curve was a general metod for selecting bandwidts for density estimation. Tat being said, te principal curve metod is still performing generally well and wit additional teory (in particular on kernel selection and bandwidt selection in tis particular instance of signal denoising) could lead to an improved principal curve denoising metod tat may be more comparable to state of te art metods. 17

18 Figure 7: Te top on bot sides plots te underlying signal (uncorrupted) for data sizes 128 and 256. Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. In order to assess weter tis test was biased towards te wavelet metod due to use of a true signal based on sinusoids, te experiment was re-run using a piece-wise linear signal as in Ozertem and Erdogmus [2008]. Te resulting plots (similar to tose in Figures 6 troug 8) are in te supplementaray appendix. As is in tis case, te principal curve metod outperforms te wavelet metod for data sets of small sizes N < 32 but by te time N > 64 te wavelet metod outperforms te principal curve metod (in terms of MSE). It sould be noted tat te KDE-SCMS is O(N 2 ) wereas te DWT is O(N) (faster tan te fast Fourier transform) toug in principle te KDE-SCMS can be parallelized reducing te computation load. Figure 8: Te top plots te underlying signal (uncorrupted) for data sized Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. 18

19 6 Significance Tis metod of defining principal curves and surfaces offers a number of advantages over existing metods. By defining principal surfaces as inerent structures of te geometry rater tan solutions to an optimizing criterion O&E allow for a ricer definition of principal curves. Furtermore, te metod extends te existing principal curves literature by defining principal surfaces in suc a manner tat can be naturally extended from principal curves of 1 dimension to principal surfaces of arbitrary dimension. Currently no oter metod of defining principal curves allows for tis. Additionally, tis definition allows finding principal curves in data wit loops, bifurcations and self-intersections witout any additional canges in te definition or algoritm. In teir definition of principal curves O&E rely on a known density function p(x). In practice, of course, tis is not available and so must be estimated from data. O&E take te approac of approximating p(x) via Kernel Density Estimation. Tis allows tem to do a number of tings. First, te smootness constraints tat are usually placed on principal curves can be removed by assuming tat p(x) itself is smoot resulting in inerently smoot principal curves. If p(x) is estimated via KDE te result will be inerently smoot. Furtermore, tese Kernel Density estimates always ave second order derivatives and so principal curves (as defined by O&E) are well defined, and, under certain regularity conditions tey are consistent. Finally, issues of overfitting and outlier robustness can be andled in te density estimation pase (wic as a muc larger existing literature) tan in te principal curve approximation pase. Wile tis metod offers a new insigt into principal curves, its impact in manifold learning is less supported. Te Subspace Constrained Mean-Sift (SCMS) algoritm presented by te autors may converge to a principal surface of dimension d, owever, te vector of values will still be in te ambient, larger, dimension D > d. Tus, wile te points are guaranteed to 19

20 lie on a surface of lower inerent dimension, tis metod alone cannot be used for dimension reduction unless paired wit anoter suitable algoritm to parametrize te principal surface or approximate it by projecting te points onto vectors of lower dimension. Muc of te literature on manifold learning assumes tat tere is an underlying true manifold from wic te data is generated. Tese metods can assess teir quality by determining if it will recover te true manifold given sufficient data. Te principal surface metod defined by O&E does not assume an underlying manifold (in fact tere is no guarantee tat te resulting principal surface itself is indeed a manifold), and it is unknown weter te metod would recover te underlying manifold if te data were generated from one, or if te principal surface of appropriate dimension can be used as a reasonable estimate of te underlying manifold. Neverteless, principal curves as defined by O&E ave enjoyed success in signal processing. In particular, tey ave been used in vector quantization Gassabe et al. [2012], as well as in signal denoising Ozertem et al. [2008]. Recently, Zang and Pedrycz [2014]t proposed extending principal curves to Granular principal curves in order to do apply principal curves to large data sets by granulating te data. Te metod proposed by O&E opens te door to a new (potentially ric) principal curve/surface framework to be studied and applied. References M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), pages , M.A. Carreira-Perpiñán. Gaussian mean sift is an em algoritm. IEEE Transactions on Pattern Analysis and Macine Intelligence 29, pages ,

21 J. E. Cacon, T. Duong, and M.P. Wand. Asymptotics for general multivariate kernel density derivative estimators. Statistica Sinica, in press, D. Comaniciu and P. Meer. Mean sift: a robust approac toward feature space analysis. IEEE Transactions on Pattern Analysis and Macine Intelligence, 24(5): , P. Delicado. Principal curves and principal oriented points, URL ttp:// D.L Donoo and I.M. Jonstone. Ideal spatial adaptation by wavelet srinkage. Biometrika, 81, pages , Y.A. Gassabe, T. Linder, and G. Takaara. On noisy source vector quantization via subspace constrained mean sift algoritm. Proceedings of te 26t Biennial Symposium on Communications, Kingston, Canada, pages , Y.A. Gassabe, T. Linder, and G. Takaara. On some convergence properties of te subspace constrained mean sift. Pattern Recognition 46, pages , A.A. Goldstein. Convex programming in ilbert spaces. Bulletin of te American Matematical Society:70, pages , T. Hastie and W. Stuetzle. Principal curves. Journal of American Statistical Association, 84: , B. Kegl. Principal curves; learning, design, and applications. PD tesis, Concordia University, Montreal, Canada, J.P. Leiva-Murillo and A.A. Rodríguez. Algoritms for gaussian bandwidt selection in kernel density estimators. Pattern Recognition Letters, Vol 33. Issue 13, pages , E.S. Levitin and B.T. Polyak. Constrained minimization problems. USSR Computational Matematics and Matematical Pysics 6, pages 1 50,

22 X. Li, Z. Hu, and F. Wu. A note on te convergence of te mean sift. Pattern Recognition 40, pages , U. Ozertem and D. Erdogmus. Local conditions for critical and principal manifolds. IEEE Int. Conf on Acoustics Speec and Signal Processing, pages , U. Ozertem, D. Erdogmus, and O. Arikan. Piecewise smoot signal denoising via priciple curve projections. IEEE International Conference on Macine Learning for Signal Processing, pages , S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(55500), pages , J.B Tenenbaum, V. de Silva, and J.C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), pages , W. Wang and M.A. Carreira-Perpiñán. Manifold blurring mean sift algoritsm for manifold denoising. Proceeding of te IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), pages , K.Q. Weinberger and L.K. Saul. Unsuperviased learning of image manifolds by semidefinite programming. International Journal of Computer Vision 70(1), pages 77 90, H. Zang and W. Pedrycz. From principal curves to granular principal curves. IEE Transactions on Cybernetics Vol 44. N0. 6, pages ,

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin.

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin. I Introduction. Quantum Mecanics Capter.5: An illustration using measurements of particle spin. Quantum mecanics is a teory of pysics tat as been very successful in explaining and predicting many pysical

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION

LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION Xiaowen Dong, Dorina Tanou, Pascal Frossard and Pierre Vandergeynst Media Lab, MIT, USA xdong@mit.edu Signal Processing Laboratories, EPFL,

More information

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

Fast optimal bandwidth selection for kernel density estimation

Fast optimal bandwidth selection for kernel density estimation Fast optimal bandwidt selection for kernel density estimation Vikas Candrakant Raykar and Ramani Duraiswami Dept of computer science and UMIACS, University of Maryland, CollegePark {vikas,ramani}@csumdedu

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225 THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:

More information

Technology-Independent Design of Neurocomputers: The Universal Field Computer 1

Technology-Independent Design of Neurocomputers: The Universal Field Computer 1 Tecnology-Independent Design of Neurocomputers: Te Universal Field Computer 1 Abstract Bruce J. MacLennan Computer Science Department Naval Postgraduate Scool Monterey, CA 9393 We argue tat AI is moving

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Artificial Neural Network Model Based Estimation of Finite Population Total

Artificial Neural Network Model Based Estimation of Finite Population Total International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony

More information

Math 312 Lecture Notes Modeling

Math 312 Lecture Notes Modeling Mat 3 Lecture Notes Modeling Warren Weckesser Department of Matematics Colgate University 5 7 January 006 Classifying Matematical Models An Example We consider te following scenario. During a storm, a

More information

ch (for some fixed positive number c) reaching c

ch (for some fixed positive number c) reaching c GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan

More information

Solving Continuous Linear Least-Squares Problems by Iterated Projection

Solving Continuous Linear Least-Squares Problems by Iterated Projection Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu

More information

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Symmetry Labeling of Molecular Energies

Symmetry Labeling of Molecular Energies Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Gradient Descent etc.

Gradient Descent etc. 1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION. 1. Introduction

INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION. 1. Introduction INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION PETER G. HALL AND JEFFREY S. RACINE Abstract. Many practical problems require nonparametric estimates of regression functions, and local polynomial

More information

Fast Exact Univariate Kernel Density Estimation

Fast Exact Univariate Kernel Density Estimation Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis

More information

MATH1131/1141 Calculus Test S1 v8a

MATH1131/1141 Calculus Test S1 v8a MATH/ Calculus Test 8 S v8a October, 7 Tese solutions were written by Joann Blanco, typed by Brendan Trin and edited by Mattew Yan and Henderson Ko Please be etical wit tis resource It is for te use of

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Bootstrap prediction intervals for Markov processes

Bootstrap prediction intervals for Markov processes arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

CS522 - Partial Di erential Equations

CS522 - Partial Di erential Equations CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing

More information

An Empirical Bayesian interpretation and generalization of NL-means

An Empirical Bayesian interpretation and generalization of NL-means Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation

More information

Math 1241 Calculus Test 1

Math 1241 Calculus Test 1 February 4, 2004 Name Te first nine problems count 6 points eac and te final seven count as marked. Tere are 120 points available on tis test. Multiple coice section. Circle te correct coice(s). You do

More information

Click here to see an animation of the derivative

Click here to see an animation of the derivative Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions

More information

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract

More information

On the Identifiability of the Post-Nonlinear Causal Model

On the Identifiability of the Post-Nonlinear Causal Model UAI 9 ZHANG & HYVARINEN 647 On te Identifiability of te Post-Nonlinear Causal Model Kun Zang Dept. of Computer Science and HIIT University of Helsinki Finland Aapo Hyvärinen Dept. of Computer Science,

More information

Math 2921, spring, 2004 Notes, Part 3. April 2 version, changes from March 31 version starting on page 27.. Maps and di erential equations

Math 2921, spring, 2004 Notes, Part 3. April 2 version, changes from March 31 version starting on page 27.. Maps and di erential equations Mat 9, spring, 4 Notes, Part 3. April version, canges from Marc 3 version starting on page 7.. Maps and di erential equations Horsesoe maps and di erential equations Tere are two main tecniques for detecting

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

arxiv: v1 [math.oc] 18 May 2018

arxiv: v1 [math.oc] 18 May 2018 Derivative-Free Optimization Algoritms based on Non-Commutative Maps * Jan Feiling,, Amelie Zeller, and Cristian Ebenbauer arxiv:805.0748v [mat.oc] 8 May 08 Institute for Systems Teory and Automatic Control,

More information

NONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL. Georgeta Budura

NONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL. Georgeta Budura NONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL Georgeta Budura Politenica University of Timisoara, Faculty of Electronics and Telecommunications, Comm. Dep., georgeta.budura@etc.utt.ro Abstract:

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits.

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits. Questions 1. State weter te function is an exponential growt or exponential decay, and describe its end beaviour using its. (a) f(x) = 3 2x (b) f(x) = 0.5 x (c) f(x) = e (d) f(x) = ( ) x 1 4 2. Matc te

More information

Derivation Of The Schwarzschild Radius Without General Relativity

Derivation Of The Schwarzschild Radius Without General Relativity Derivation Of Te Scwarzscild Radius Witout General Relativity In tis paper I present an alternative metod of deriving te Scwarzscild radius of a black ole. Te metod uses tree of te Planck units formulas:

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

MA455 Manifolds Solutions 1 May 2008

MA455 Manifolds Solutions 1 May 2008 MA455 Manifolds Solutions 1 May 2008 1. (i) Given real numbers a < b, find a diffeomorpism (a, b) R. Solution: For example first map (a, b) to (0, π/2) and ten map (0, π/2) diffeomorpically to R using

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016 MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

2.1 THE DEFINITION OF DERIVATIVE

2.1 THE DEFINITION OF DERIVATIVE 2.1 Te Derivative Contemporary Calculus 2.1 THE DEFINITION OF DERIVATIVE 1 Te grapical idea of a slope of a tangent line is very useful, but for some uses we need a more algebraic definition of te derivative

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles

More information

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3

More information

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

3.4 Worksheet: Proof of the Chain Rule NAME

3.4 Worksheet: Proof of the Chain Rule NAME Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are

More information

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

[db]

[db] Blind Source Separation based on Second-Order Statistics wit Asymptotically Optimal Weigting Arie Yeredor Department of EE-Systems, el-aviv University P.O.Box 3900, el-aviv 69978, Israel Abstract Blind

More information

Continuous Stochastic Processes

Continuous Stochastic Processes Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling

More information

A SHORT INTRODUCTION TO BANACH LATTICES AND

A SHORT INTRODUCTION TO BANACH LATTICES AND CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here! Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.

More information

Topics in Generalized Differentiation

Topics in Generalized Differentiation Topics in Generalized Differentiation J. Marsall As Abstract Te course will be built around tree topics: ) Prove te almost everywere equivalence of te L p n-t symmetric quantum derivative and te L p Peano

More information

Fast Explicit and Unconditionally Stable FDTD Method for Electromagnetic Analysis Jin Yan, Graduate Student Member, IEEE, and Dan Jiao, Fellow, IEEE

Fast Explicit and Unconditionally Stable FDTD Method for Electromagnetic Analysis Jin Yan, Graduate Student Member, IEEE, and Dan Jiao, Fellow, IEEE Tis article as been accepted for inclusion in a future issue of tis journal. Content is final as presented, wit te exception of pagination. IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES 1 Fast Explicit

More information

arxiv: v1 [math.pr] 28 Dec 2018

arxiv: v1 [math.pr] 28 Dec 2018 Approximating Sepp s constants for te Slepian process Jack Noonan a, Anatoly Zigljavsky a, a Scool of Matematics, Cardiff University, Cardiff, CF4 4AG, UK arxiv:8.0v [mat.pr] 8 Dec 08 Abstract Slepian

More information