An Analysis of Locally Defined Principal Curves and Surfaces
|
|
- Derick Randolf Marshall
- 5 years ago
- Views:
Transcription
1 An Analysis of Locally Defined Principal Curves and Surfaces James McQueen Department of Statistics, University of Wasington Seattle, WA, 98195, USA Abstract Principal curves are generally defined as smoot curves passing troug te middle of te data and ave wide applications in macine learning for example in dimensionality reduction and feature extraction. Recently Umut Ozertem and Deniz Erdogmus (O&E) provide us wit a novel approac for defining principal curves and surfaces in teir paper Locally Defined Principal Curves and Surfaces (2011). Tis report aims to reproduce te results of teir paper and provide critical assessment of its performance, flaws, and merits. 1 Introduction Peraps te most popular dimension reduction tool used today is known as principal components analysis (PCA). PCA is an ortogonal linear transformation wic transforms (rotates) te data into a new coordinate system suc tat te greatest variance of te data projected on eac of te new coordinates is associated wit te first coordinate (first principal component), te second greatest variance wit te second coordinate, etc. Tis is a commonly used tool in dimension reduction as we can project te original data onto te subspace spanned by te first d principal components, tus preserving most of te variance of te data but reducing te dimension. As we are projecting onto a linear space tis can be describe as linear dimension reduction. Inerent to tis metod is tat te first principal component is a line, te first and 1
2 second principal components create a plane, etc., and we rank tese lines according to te amount of variance explained by eac. Te success of PCA as a linear dimension reduction tecnique begs te question: can we extend PCA to a non-linear set of principal surfaces tat retain some of te desirable properties of principal lines? 2 Callenges Tere are a number of callenges in non-linear dimension reduction in general, additionally, tere is no agreed upon definition of a principal curve and terefore no agreed metod of estimating tem. Popular non-linear dimension reduction (or manifold learning) tecniques suc as Isomap (Tenenbaum et al. [2000]), local linear embedding (Roweis and Saul [2000]), Laplacian eigenmaps (Belkin and Niyogi [2003]), and maximum variance unfolding (Weinberger and Saul [2006]) rely on generating locality information of data samples from a data proximity grap. Tese tecniques, owever, depend on careful tuning of parameters controlling grap structure as te accuracy of tese metods depends on te quality of te grap. Furtermore, many of tese tecniques assume tat te data truly lie on a manifold of inerent dimension d and try to recover te underlying manifold. Tese metods rely on te validity of te assumption. Tere is currently no agreed upon definition of principal curves as tey do not come about so naturally as principal components (lines). Principal curves are generally understood to be smoot curves passing troug te middle of te data, owever, a more matematically precise definition is required. Many definitions involve taking a property of te principal line and ten try to find a smoot curve tat fits tese properties: Hastie and Stuetzle [1989] require tem to be self-consistent wereas Delicado [1998] restricts teir total variance and conditional means. After requiring tat a principal curve satisfies some constraint, one ten devises an algoritm tat finds te principal curve tat meets te requirement and minimizes 2
3 a criterion (suc as mean-squared projection errror). Terefore, tese metods aim to find a curve tat best fits te data. Tis as two primary flaws. First, trying to minimize some data-weigted criterion leads to overfitting and so regularization is almost certainly required. A more pilosopical issue is tat principal curves ougt to be tougt of as inerent structures of te data generating mecanism tat ave to be approximated as opposed to defined to be te solutions to an algoritm. 3 Metod Ozertem and Erdogmus (encefort referred to as O&E) take a novel approac to principal curves by defining tem to be inerent structures of te underlying probability distribution of te data. Tey consider a principal surface defined suc tat every point on te principal surface is a local maximum (local mode) of te probability density in te local ortogonal subspace. In particular tis definition implies tat te principal curve (surface of dimension 1) is te ridge of te probability distribution a natural result. Defining principal curves as structures of probability density functions leads to a differential geometric definition depending on te gradient and Hessian of probability density function. Let p(x) be te density function of x R n, let g( ) be te gradient of p( ) and H( ) be te Hessian. Let (λ i (x), q i (x)) be te it eigenvalue, eigenvector pair of H(x). Let C d be te set points x suc tat tere exists a set I {1,..., n} wit I = n d suc tat g(x) T q i (X) = 0 i I. We say x is a regular point of C d if te set I is unique, tat is, g(x) is perpendicular to exactly (n d) eigenvectors. Any regular point x of C d is perpendicular to n d ortogonal eigenvectors and terefore all suc points must lie on a surface wit intrinsic dimension d. Ten define P d te principal surface of dimension d to be te set of regular points of C d suc tat λ i (x) < 0 i I, tat is, P d contains te local maxima of te ortogonal subsbace C d (x) = span{q i(x) i I } (as te gradient at tese points projected 3
4 onto C d (x) is by definition zero). To see wy tis definition is natural, consider P 0. Tis is te set of points tat are ortogonal to n 0 = n eigenvectors (tat is, all of tem) terefore g(x) = 0 so tey are critical points, but additionally tey are te critical points associated wit negative eigenvalues and are tus local maxima. Hence P 0 is te set of local modes, P 1 is te principal curve (surface of dimension 1) and defines te ridge of te density function, i.e. tere is only one direction of increase. Tis is a satisfying definition as it bot defines principal curves as inerent structures of te data generating mecanism and it naturally extends from principal curves (of dimension 1) to principal surfaces of arbitrary dimension d > 1. Additionally, tese surfaces exist wenever te density function permits a gradient and Hessian. In practice Kernel Density Estimation or parametric density estimation (e.g. mixture models) are used to estimate p and tey are composed from densities wit at least second order derivatives. 3.1 Existence and Consistency of Principal Curves Since tis definition of principal curves depends on first and second derivatives of te density as long as tese exist suc tat te essian is non-zero ten te principal curve exist. Tese conditions are mild and since in practice kernel densities are used tese guarantee te gradient and Hessian exist and are continuous and so te principal surfaces will exist. Cacon et al. [2011] sow tat under te assumption tat te kernel bandwidt matrix converges to zero fast enoug, tat te underlying density and kernel ave a sufficient number of continuous square-integrable derivatives, tat te kernel as finite covariance ten te integrated meansquared-error between te vector of order-r derivatives of te KDE converge to tose of te true density. Terefore for a sufficiently smoot kernel and density, te derivatives of te KDE are consistent. Consequently, since te principal surfaces are defined by te first and second derivatives tey too must be consistent. 4
5 4 Algoritm O&E present an adjustment to te mean sift algoritm tat tey claim will converge to te principal surface of dimension d i.e., P d. In te algoritm we initialize wit eiter a mes of points or te data points temselves (te latter ensures te resulting principal surface will be in te support of te data and is also te projection of te data onto te surface). 4.1 Monotonically Increasing functions and Local Covariance Instead of involving te Hessian H(x) in teir subspace constrained mean-sift algoritm, te autors define a new matrix called te Local Covariance. In order to motivate teir definition of Local Covariance, in teir paper te autors sow tat: Lemma 4.1 For strictly increasing, twice differentiable functions f te principal set P d of a density p(x) is te same as te principal set P d of te transformed density f(p(x)). Let x P d wit pdf p(x), gradient g(x) and essian H(x). Let H(x) = QΛQ be te eigendecomposition. Since x is a a point in P d its gradient g(x) is ortogonal to all eigenvectors q i (x) in te set I and wose span is te ortogonal space. Let Q be te matrix wose columns are composed of tese eigenvectors, consequently let Q composed of te remaining eigenvectors tat span te parallel space. Ten in tis case we may write H(x) = Q Λ Q T +Q Λ Q T were te Λ s are te corresponding eigenvalues. Since g(x) is ortogonal to te vectors in Q by definition, it must be in te parallel space: g(x) = Q β for some weigt vector β. We ten calculate tat gradient and essian of te transformed pdf f(p(x)). g f (X) = f (p(x))g(x) = f (p(x))q β = Q (f (p(x))β) Q β 5
6 Terefore te gradient is also in te parallel space and x Cf d as well. H f (x) = f (p(x))h(f(x)) + f (p(x))g(x)g(x) T = f (p(x)) [ Q λ Q T + Q Λ Q ] + f (p(x))q ββ T Q T = { f (p(x))q λ Q T + f (p(x))q ββ T Q T } + f (p(x))q Λ Q Ten since f (x) > 0 for all x te sign eigenvalues of te ortogonal space do not cange and as suc x Pf d, as required. Consider te special case of f = log(x) and p(x) is te Gaussian density, we ave te property tat: H(x) = ( 1/2)Σ 1 Were Σ is te covariance matrix for te Gaussian distribution. Tis implies tat wen te underlying density is assumed to be Gaussian te principal curve definition coincides wit principal components. Tis leads O&E to define a local covariance for any distribution p(x) based on te above. Tis also gives us an ordering of te eigenvectors (as in PCA) suc tat we select te n d eigenvectors associated wit te n d largest eigen-values of te local covariance: Σ 1 (x) 2H log (X) = H(X) p(x) + g(x)g(x)t p(x) 2 By Lemma 4.1 te principal surface defined by using te local covariance is identical to te principal surface defined by te Hessian. In fact, te eigenvalues of te Hessian are just p(x) times te eigenvalues of te local covariance tus we equally take te eigenvectors of H(x) associated wit te n d smallest eigenvalues of te Hessian. In Gassabe et al. [2013] tey point out tat wile te motivation for te use of te so-called local inverse covariance is due to te relationsip to principal components wen te underlying density is Gaussian, in practice te density used will never be Gaussian and so tey argue tat direct 6
7 use of te Hessian or oter estimates of te local covariance can be used wit impunity. In fact, tey prove convergence of te algoritm in all of tese cases (tat is, convergence in a finite number of steps not necessarily to te principal surface). Simulations in teir paper also indicate significant computational-savings using two local-covariance estimates defined by Wang and Carreira-Perpiñán [2010] wit no worse performance (in terms of mean square deviation from te underlying generative spiral) wen compared to use of te local covariance as defined by O&E. 4.2 Mean-sift In order to define an algoritm tat converges to te principal surface of dimension d: P d te autors adjust te well-known Mean-Sift algoritm, tus we briefly review wat te mean-sift algoritm does and wy it makes sense to adjust it for tis purpose. Te mean-sift algoritm is a general-purpose algoritm for finding local-modes in data. It is often used (Comaniciu and Meer [2002]) for clustering. It is a non-parametric metod tat assumes an underlying kernel density estimate. As we will be using te Gaussian Kernel trougout te paper we will specialize to tis case. If we define te k(x) be te Gaussian profile : k(x) = exp ( 1 2 x) (different from te Kernel in tat te squaring operation is done before passing it to te function k). Ten we define te kernel density estimate of p(x t ) te underlying distribution of x t R n based on N i.i.d samples x i p( ) as: ˆp(x t ) 1 ( ) n/2 N ( 1 k N n 2π i=1 x t x i A local mode is a local maximum in te density function, tus, to find a local maximum we take te gradient of te density function wit respect to te point of interest x t and set it 2 ) 7
8 to zero: ( ) n/2 N ( 2 1 ˆp(x) g(x) = (x x N n+2 i )k x t x i 2π i=1 ( ) n/2 N ( 1 1 = (x N n+2 i x)k x t x i ) 2 2π i=1 ( ) [ n/2 N ( 1 1 = k x t x i )] ( N 2 i=1 k x t x i 2) x i N n+2 2π ( N i=1 i=1 k x t x i 2) 2 ) xt Te step in te second line follows as k (x) = 1 k(x) for te special case of Gaussian profile. 2 Te quantity in square brackets: N i=1 k ( x t x i N i=1 k ( x t x i 2) x i 2) xt m(x) Is called te mean-sift and we iteratively set te gradient to zero by setting: x t+1 = x t + m(x t ) A so-called mean-sift update. Wen tis becomes a fixed point we ave found a local mode. 4.3 Deriving te Gradient and Hessian As p(x) is generally unknown, O&E assume an underlying Kernel Density Estimate. Te definition applies for any estimate p(x), owever, presently we will consider fixed bandwidt Kernel Density Estimators wit Gaussian Kernel. Since we are using te same KDE as in 8
9 te Mean-sift te gradient is te same: ˆp(x) g(x) = ( ) [ n/2 N ( 1 1 k N n+2 2π i=1 x t x i )] ( N 2 i=1 k x t x i 2) x i ( N i=1 k x t x i 2) xt We can also simplify te gradient to: g(x) = p(xt ) 2 [ m(x t ) x t] Taking te second derivative we get te Hessian: ( ) n/2 N { ( H(x t 1 1 ) = k x t x i ) 2 [ ] } 1 N n+2 2π 2 (xt x i )(x x i ) T I n i=1 ( ) n/2 N ( 1 1 = k x t x i ) ( 2 N i=1 N n+4 2π i=1 k x t x i 2) (x t x i )(x x i ) T ( N i=1 k x t x i 2) ( N = 1 i=1 k x t x i 2) (x t x i )(x t x i ) T 4 p(xt ) ( N i=1 k x t x i 2) 2 I n p(xt ) 4 { v(x t ) 2 I n } 2 I n Ten we are trying to find points x t wose gradient is ortogonal to exactly n d eigenvectors of te Hessian, tat is, we are looking for local modes in te ortogonal subspace as defined in 3. Tis leads to an adjustment to te Mean-sift algoritm tat te autors name te subspace-constrained mean-sift algoritm. It is similar to a projected gradient (Goldstein [1964] and Levitin and Polyak [1966]) version of te mean sift, were te mean sift update m(x) is projected into te local ortogonal space before being used to update te trajectory x t. 9
10 4.4 Subspace Constrained Mean-sift Given te above idea of constraining te mean-sift update into te ortogonal subspace, te autors adjust te mean-sift algoritm to create te subspace constrained mean-sift algoritm. Subspace Constrained Mean Sift (SCMS) for Gaussian KDE Input: denisty estimate p(x), desired dimension d, tolerance ɛ > 0. Initialize: Trajectories x 0 1,..., x 0 K to a mes or data points. for k = 1 to k = K do wile not converged do 1. m(x t k ) 2. g(x t k ) p(xt k ) 2 3. v(x t k ) N i=1 k N i=1 k ( x t x i N i=1 k ( x t x i 2) x i 2) x t evaluate mean-sift [m(x t k ) xt k ] evaluate gradient ( xt x ) 2 i (x t x i )(x t x i ) T ( N i=1 k x t x i 2) 4. H(x t k ) p(xt ) {v(x t ) 2 I 4 n } evaluate Hessian 5. Σ 1 (x t k ) 1 p(x t k )H(xt k ) + 1 g(x t p(x t k )2 k )g(xt k )T evaluate local covariance 6. perform te eigendecomposition: Σ 1 (x t k ) = VΛV 7. V [v 1,... v n ] eigenvectors wit te (n d) largest eigenvalues of Σ 1 (x t k ). 8. ˆm(x t k ) V V T m(xt k ) project m(xt k ) onto te ortogonal sub-space 9. x t k ˆm(xt k ) + xt k projected/subspace constrained mean-sift update if g T (x t k )VT g(xt k ) /( g(xt k ) VT g(xt k ) ) < ɛ ten declare converged else x t+1 k end if end wile end for x t k Note tat eac trajectory k can be run individually witout knowledge of te oters (tus te for loop can be run in parallel). Te parallelization of te algoritm will decrease computation time, owever, te procedure is still inerently iterative in eac trajectory and requires evaluating te kernel density as well as an eigendecomposition at eac step. Te algoritm is O(N 2 n 3 ) were N is te number of data points and n is te dimension of 10
11 te data. Tus, even wen run in parallel, for large data sets (especially of large dimension) tis algoritm can be slow. 4.5 On te convergence of SCMS It sould be noted tat te autors claim convergence of te SCMS algoritm by relation to te convergence of te Mean-sift algoritm proposed in Comaniciu and Meer [2002], owever, Li et al. [2007] pointed out a fundamental mistake in te proof of te MS algoritm in Comaniciu and Meer [2002], tus tere are no proofs of te optimality of te algoritm and weter or not it converges to te principal curve/surface. Recently, Gassabe et al. [2013] investigated te convergence properties of te SCMS algoritm. It was sown in Carreira-Perpiñán [2007] tat if a Gaussian profile is used ten te MS algoritm reduces to an EM algoritm and tus converges, owever, use of oter profiles do not guarantee convergence. In Gassabe et al. [2013] tey point out tat even if te MS converges it is not obvious tat tis implies convergence of te SCMS let alone to te desired principal surface. Tey do sow, owever, tat te algoritm will converge (i.e. it will end) in a finite number of steps toug not necessarily to te correct surface. 5 Experiments In order to examine te principal curve metod proposed by O&E we perform a number of experiments. In te first two sections we compare te O&E s princpal curve metod to te original Hastie & Stuetzle principal curve metod as well as te metod proposed by Kegl [1999]. In te tird section we display te robustness of O&E s principal curve metod and SCMS algoritm to andle more complicated data sets witout canges. Finally, we perform a simulation study to compare te principal curve and wavelet denoising metods. 11
12 5.1 Standard Principal Curve Data Sets In tis section we examine ow te principal curve algoritm defined by O&E performs on some standard principal curve data sets. Tis is primarily to ensure tat te metods ave been replicated accurately and te results sould look similar to te original paper. We compare tis metod to te metods proposed by Hastie & Stuetzle and Kegl Zig-Zag Data set Figure 1: Te Zig-Zag data set is plotted in ( ), Hastie & Stuetzle s principal curve curve in yellow, and Kegl s Polygonal Line in blue. 1 Figure 1 plots te Hastie and Stuetzle and Kegl s polygonal line algoritm wit te zig-zag data set 2. Figure 2 plots te principal curve for a variety of values of te bandwidt parameter. Once a bandwidt is selected te pricipal curve is found using te SCMS algoritm presented in 4.4. Te algoritm is initialized on te original data points suc tat te resulting curve is te projection of te data onto te principal curve as defined by O& E. Te bandwidt parameters were cosen to display te importance of appropriate selection of te bandwidt as te results can vary eavily from small canges. 1 Computed using Kegl s Java application code: 2 Data set provided by Kegl 12
13 Figure 2: Zig-zag data set ( ), O&E Principal Curve (blue), an intensity map of te associated KDE estimate is in green indicating te different curves for different bandwidts Spiral Data Set Here we take te spiral data set and compare te tree metods of finding principal curves. Figure 3 plots te Hastie & Stuetzle line as well as te Polygonal Line. Figure 4 plots te principal curve solution for different values of te bandwidt parameter. It sould be noted tat te bandwidt tat performs well on one data set (e.g. zig-zag data) does not necessarily perform well on anoter data set (e.g. spiral data). Terefore, in practice it is advisable to use a data-dependent kernel bandwidt, for example selecting by leave-one out maximum likeliood as we will below. Figure 3: Te Spiral data set is plotted in ( ), Hastie & Stuetzle s principal curve in yellow, as well as Kegl s Polygonal Line in blue 13
14 Figure 4: Spiral data set ( ), O&E Principal Curve (blue), an intensity map of te associated KDE estimate is in green indicating te different curves for different bandwidts. 5.2 Oter Data Sets In addition to te standard principal curve data sets, te principal curve metod defined by O&E can andle arbitrarily complicated data sets including tose wit self-intersections, bifrucations and loops. Tese data sets do not require alteration of te algoritm. Bot of tese are improvements over existing principal curve metods. Tis data set as many self-loops and bifurcations but can be andled by te SCMS algoritm wit no additional canges. Figure 5 sows te resulting principal curve on two complicated data sets wit self-intersections. Te first is a star sape and te second is a epitrocoid. Figure 5: Te underlying data are and te resulting O&E principal curve is plotted in blue. Te left is a star and te rigt is an epitrocoid. Tese plots display te ability of principal curves as defined by te autors to andle complex data sets 14
15 5.3 Signal Denoising Te autors claim tat te principal curve metod can be applied to solve te problem of denoising a signal. In Ozertem et al. [2008] tey apply teir metod to piece-wise linear functions tat ave been corrupted by wite were tey acieve some level of success but do not compare to oter more robust denoising metods. In tis case we will consider a deterministic one-dimensional time signal D wic as been corrupted by some form of i.i.d mean-zero noise ɛ. In tis case we will let ɛ be Gaussian wite noise. We define te signal D deterministically as a function of sinusoids. Te goal of any denoising metod is, given a corrupted signal X = D+ɛ to estimate D. Naturally, as variability of te noise increases tis task becomes more callenging. Tis leads to defining a notion of Signal-To-Noise-Ratio: SNR = D 2 E[ ɛ 2 ] It is common in te signal processing literature to write te SNR in terms of decibels. SNR (in db) = 10 log 10 (SNR). As suc we will follow tis standard Principal Curve Denoising Here we follow Ozertem et al. [2008] in teir paper on applying principal curve metods to denoise piece-wise linear signals. Tat is, we use te Gaussian kernel wit single bandwidt parameter tat we will select by leave-one-out maximum likeliood cross-validation as in Leiva-Murillo and Rodríguez [2012] Tat is, we select te bandwidt tat maximizes P r(x i x i ) over te entire data set were x i is te data set excluding te point i. Te estimated true signal D will be P 1 = ˆD i.e. te principal curve for te data set under te Gaussian KDE. 15
16 5.3.2 Wavelet Denoising As muc of te details in wavelet denoising teory are beyond te scope of tis paper, our discussion will be brief. Te Discrete Wavelet Transformation (DWT) is an ortonormal transformation, suc tat, for an ortonormal matrix W (defined by a coice of wavelet filter) we define te wavelet coefficients to be: W = WX. Since W is an ortonormal transformation tis representation W bot preserves energy (i.e. W 2 = X 2 but also is an exact (alternate) representation of te signal in tat we can invert te transformation to recover te data set: X = W 1 W = W T W. Were te last step follows from te ortonormality of W. Te metod of Wavelet denoising comes from te more general ortonormal transformation denoising. Te metodology is simple: 1. Given a signal X and a coice of wavelet filter we calculate te wavelet coefficients W 2. Given a tresold δ set to zero all wavelet coefficients W t suc tat W t < δ 3. Given tresolded coefficients W (T ) calculate te inverse transformation to arrive at new signal X (T ) Ten we take X (T ) to be our estimate of D. Based on te work of Donoo and Jonstone [1994] we use te universal tresold wic is derived from assuming Guassian wite noise (as in tis case): δ (U) = 2σ 2 e log(n). Since te variance of te noise is (typically) unknown we use te Median Absolute Deviation (MAD) estimate σ 2 (MAD) = median{w 1,1,...,W 1,N/2 } wic under certain assumptions (Donoo and Jonstone [1994]) is an unbiased estimate of σ 2 e Results We fix a sample size N and generate corrupted signal X = D + ɛ varying te SNR. We ten apply bot wavelet and principal curve metods to estimate D. We evaluate teir 16
17 performance based on Mean-Squared-Error from te true signal D. We ten repeat tis procedure 100 times for eac sample size N and eac SNR to provide Monte-Carlo standard deviation bounds. In figure 6 we consider small sample sizes N = 32 and N = 64. We plot log MSE to make te plots easier to read. In bot cases ere we see tat in terms of meansquared error for any signal to noise ratio tat te principal curve metod is out-performing te wavelet metod. Figure 6: Te top on bot sides plots te underlying signal (uncorrupted) for data sizes 32 and 64. Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. Wat we see in figures 7 and 8, owever, is a cange as we increase te sample size. For larger data sets te more teoretically well-founded wavelet denoising metod vastly outperforms te principal curve metod wic appears to stagnate around -8 log mean-squared error regardless of sample size. Tere migt be a variety of reasons for tis. First, tere exists a great deal of teory on wavelet denoising (and ortonormal transforms in general), wereas tere is none for te principal curves. In particular, te tresold δ was found assuming an underlying Gaussian wite noise process but te bandwidt selection metod for te principal curve was a general metod for selecting bandwidts for density estimation. Tat being said, te principal curve metod is still performing generally well and wit additional teory (in particular on kernel selection and bandwidt selection in tis particular instance of signal denoising) could lead to an improved principal curve denoising metod tat may be more comparable to state of te art metods. 17
18 Figure 7: Te top on bot sides plots te underlying signal (uncorrupted) for data sizes 128 and 256. Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. In order to assess weter tis test was biased towards te wavelet metod due to use of a true signal based on sinusoids, te experiment was re-run using a piece-wise linear signal as in Ozertem and Erdogmus [2008]. Te resulting plots (similar to tose in Figures 6 troug 8) are in te supplementaray appendix. As is in tis case, te principal curve metod outperforms te wavelet metod for data sets of small sizes N < 32 but by te time N > 64 te wavelet metod outperforms te principal curve metod (in terms of MSE). It sould be noted tat te KDE-SCMS is O(N 2 ) wereas te DWT is O(N) (faster tan te fast Fourier transform) toug in principle te KDE-SCMS can be parallelized reducing te computation load. Figure 8: Te top plots te underlying signal (uncorrupted) for data sized Te bottom plots te resulting log MSE for te principal curve metod (black) and wavelet metod (blue) for varying SNRs. 18
19 6 Significance Tis metod of defining principal curves and surfaces offers a number of advantages over existing metods. By defining principal surfaces as inerent structures of te geometry rater tan solutions to an optimizing criterion O&E allow for a ricer definition of principal curves. Furtermore, te metod extends te existing principal curves literature by defining principal surfaces in suc a manner tat can be naturally extended from principal curves of 1 dimension to principal surfaces of arbitrary dimension. Currently no oter metod of defining principal curves allows for tis. Additionally, tis definition allows finding principal curves in data wit loops, bifurcations and self-intersections witout any additional canges in te definition or algoritm. In teir definition of principal curves O&E rely on a known density function p(x). In practice, of course, tis is not available and so must be estimated from data. O&E take te approac of approximating p(x) via Kernel Density Estimation. Tis allows tem to do a number of tings. First, te smootness constraints tat are usually placed on principal curves can be removed by assuming tat p(x) itself is smoot resulting in inerently smoot principal curves. If p(x) is estimated via KDE te result will be inerently smoot. Furtermore, tese Kernel Density estimates always ave second order derivatives and so principal curves (as defined by O&E) are well defined, and, under certain regularity conditions tey are consistent. Finally, issues of overfitting and outlier robustness can be andled in te density estimation pase (wic as a muc larger existing literature) tan in te principal curve approximation pase. Wile tis metod offers a new insigt into principal curves, its impact in manifold learning is less supported. Te Subspace Constrained Mean-Sift (SCMS) algoritm presented by te autors may converge to a principal surface of dimension d, owever, te vector of values will still be in te ambient, larger, dimension D > d. Tus, wile te points are guaranteed to 19
20 lie on a surface of lower inerent dimension, tis metod alone cannot be used for dimension reduction unless paired wit anoter suitable algoritm to parametrize te principal surface or approximate it by projecting te points onto vectors of lower dimension. Muc of te literature on manifold learning assumes tat tere is an underlying true manifold from wic te data is generated. Tese metods can assess teir quality by determining if it will recover te true manifold given sufficient data. Te principal surface metod defined by O&E does not assume an underlying manifold (in fact tere is no guarantee tat te resulting principal surface itself is indeed a manifold), and it is unknown weter te metod would recover te underlying manifold if te data were generated from one, or if te principal surface of appropriate dimension can be used as a reasonable estimate of te underlying manifold. Neverteless, principal curves as defined by O&E ave enjoyed success in signal processing. In particular, tey ave been used in vector quantization Gassabe et al. [2012], as well as in signal denoising Ozertem et al. [2008]. Recently, Zang and Pedrycz [2014]t proposed extending principal curves to Granular principal curves in order to do apply principal curves to large data sets by granulating te data. Te metod proposed by O&E opens te door to a new (potentially ric) principal curve/surface framework to be studied and applied. References M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), pages , M.A. Carreira-Perpiñán. Gaussian mean sift is an em algoritm. IEEE Transactions on Pattern Analysis and Macine Intelligence 29, pages ,
21 J. E. Cacon, T. Duong, and M.P. Wand. Asymptotics for general multivariate kernel density derivative estimators. Statistica Sinica, in press, D. Comaniciu and P. Meer. Mean sift: a robust approac toward feature space analysis. IEEE Transactions on Pattern Analysis and Macine Intelligence, 24(5): , P. Delicado. Principal curves and principal oriented points, URL ttp:// D.L Donoo and I.M. Jonstone. Ideal spatial adaptation by wavelet srinkage. Biometrika, 81, pages , Y.A. Gassabe, T. Linder, and G. Takaara. On noisy source vector quantization via subspace constrained mean sift algoritm. Proceedings of te 26t Biennial Symposium on Communications, Kingston, Canada, pages , Y.A. Gassabe, T. Linder, and G. Takaara. On some convergence properties of te subspace constrained mean sift. Pattern Recognition 46, pages , A.A. Goldstein. Convex programming in ilbert spaces. Bulletin of te American Matematical Society:70, pages , T. Hastie and W. Stuetzle. Principal curves. Journal of American Statistical Association, 84: , B. Kegl. Principal curves; learning, design, and applications. PD tesis, Concordia University, Montreal, Canada, J.P. Leiva-Murillo and A.A. Rodríguez. Algoritms for gaussian bandwidt selection in kernel density estimators. Pattern Recognition Letters, Vol 33. Issue 13, pages , E.S. Levitin and B.T. Polyak. Constrained minimization problems. USSR Computational Matematics and Matematical Pysics 6, pages 1 50,
22 X. Li, Z. Hu, and F. Wu. A note on te convergence of te mean sift. Pattern Recognition 40, pages , U. Ozertem and D. Erdogmus. Local conditions for critical and principal manifolds. IEEE Int. Conf on Acoustics Speec and Signal Processing, pages , U. Ozertem, D. Erdogmus, and O. Arikan. Piecewise smoot signal denoising via priciple curve projections. IEEE International Conference on Macine Learning for Signal Processing, pages , S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(55500), pages , J.B Tenenbaum, V. de Silva, and J.C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), pages , W. Wang and M.A. Carreira-Perpiñán. Manifold blurring mean sift algoritsm for manifold denoising. Proceeding of te IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), pages , K.Q. Weinberger and L.K. Saul. Unsuperviased learning of image manifolds by semidefinite programming. International Journal of Computer Vision 70(1), pages 77 90, H. Zang and W. Pedrycz. From principal curves to granular principal curves. IEE Transactions on Cybernetics Vol 44. N0. 6, pages ,
A = h w (1) Error Analysis Physics 141
Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.
More informationQuantum Mechanics Chapter 1.5: An illustration using measurements of particle spin.
I Introduction. Quantum Mecanics Capter.5: An illustration using measurements of particle spin. Quantum mecanics is a teory of pysics tat as been very successful in explaining and predicting many pysical
More informationNumerical Differentiation
Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function
More informationThe Priestley-Chao Estimator
Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are
More informationLAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION
LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION Xiaowen Dong, Dorina Tanou, Pascal Frossard and Pierre Vandergeynst Media Lab, MIT, USA xdong@mit.edu Signal Processing Laboratories, EPFL,
More informationA MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES
A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties
More informationHow to Find the Derivative of a Function: Calculus 1
Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te
More informationNotes on wavefunctions II: momentum wavefunctions
Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles
More informationDifferential Calculus (The basics) Prepared by Mr. C. Hull
Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit
More informationFast optimal bandwidth selection for kernel density estimation
Fast optimal bandwidt selection for kernel density estimation Vikas Candrakant Raykar and Ramani Duraiswami Dept of computer science and UMIACS, University of Maryland, CollegePark {vikas,ramani}@csumdedu
More informationExam 1 Review Solutions
Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),
More informationCopyright c 2008 Kevin Long
Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula
More informationMVT and Rolle s Theorem
AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state
More informationRegularized Regression
Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize
More informationMaterial for Difference Quotient
Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient
More information1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point
MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note
More informationSECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY
(Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative
More informationTHE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225
THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:
More informationTechnology-Independent Design of Neurocomputers: The Universal Field Computer 1
Tecnology-Independent Design of Neurocomputers: Te Universal Field Computer 1 Abstract Bruce J. MacLennan Computer Science Department Naval Postgraduate Scool Monterey, CA 9393 We argue tat AI is moving
More informationCombining functions: algebraic methods
Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)
More informationPreface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser
More information2.8 The Derivative as a Function
.8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open
More information7 Semiparametric Methods and Partially Linear Regression
7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).
More informationNUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,
NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing
More information232 Calculus and Structures
3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE
More information. If lim. x 2 x 1. f(x+h) f(x)
Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value
More information5.1 We will begin this section with the definition of a rational expression. We
Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go
More informationAverage Rate of Change
Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope
More informationDerivatives. By: OpenStaxCollege
By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator
More informationMathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative
Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x
More informationArtificial Neural Network Model Based Estimation of Finite Population Total
International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony
More informationMath 312 Lecture Notes Modeling
Mat 3 Lecture Notes Modeling Warren Weckesser Department of Matematics Colgate University 5 7 January 006 Classifying Matematical Models An Example We consider te following scenario. During a storm, a
More informationch (for some fixed positive number c) reaching c
GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan
More informationSolving Continuous Linear Least-Squares Problems by Iterated Projection
Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu
More informationTeaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line
Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,
More informationFinancial Econometrics Prof. Massimo Guidolin
CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis
More informationSymmetry Labeling of Molecular Energies
Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry
More informationThe derivative function
Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative
More informationKernel Density Based Linear Regression Estimate
Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency
More informationlecture 26: Richardson extrapolation
43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)
More informationGradient Descent etc.
1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )
More informationThe Complexity of Computing the MCD-Estimator
Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,
More informationLIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT
LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as
More informationINFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION. 1. Introduction
INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION PETER G. HALL AND JEFFREY S. RACINE Abstract. Many practical problems require nonparametric estimates of regression functions, and local polynomial
More informationFast Exact Univariate Kernel Density Estimation
Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis
More informationMATH1131/1141 Calculus Test S1 v8a
MATH/ Calculus Test 8 S v8a October, 7 Tese solutions were written by Joann Blanco, typed by Brendan Trin and edited by Mattew Yan and Henderson Ko Please be etical wit tis resource It is for te use of
More informationChapter 1. Density Estimation
Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f
More informationBootstrap confidence intervals in nonparametric regression without an additive model
Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap
More information3.1 Extreme Values of a Function
.1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc
More informationBootstrap prediction intervals for Markov processes
arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA
More informationREVIEW LAB ANSWER KEY
REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g
More informationCS522 - Partial Di erential Equations
CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its
More informationJournal of Computational and Applied Mathematics
Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing
More informationAn Empirical Bayesian interpretation and generalization of NL-means
Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation
More informationMath 1241 Calculus Test 1
February 4, 2004 Name Te first nine problems count 6 points eac and te final seven count as marked. Tere are 120 points available on tis test. Multiple coice section. Circle te correct coice(s). You do
More informationClick here to see an animation of the derivative
Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,
More informationEfficient algorithms for for clone items detection
Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x
More information4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.
Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra
More informationRecall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if
Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions
More informationA Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation
A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract
More informationOn the Identifiability of the Post-Nonlinear Causal Model
UAI 9 ZHANG & HYVARINEN 647 On te Identifiability of te Post-Nonlinear Causal Model Kun Zang Dept. of Computer Science and HIIT University of Helsinki Finland Aapo Hyvärinen Dept. of Computer Science,
More informationMath 2921, spring, 2004 Notes, Part 3. April 2 version, changes from March 31 version starting on page 27.. Maps and di erential equations
Mat 9, spring, 4 Notes, Part 3. April version, canges from Marc 3 version starting on page 7.. Maps and di erential equations Horsesoe maps and di erential equations Tere are two main tecniques for detecting
More informationContinuity and Differentiability Worksheet
Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;
More informationarxiv: v1 [math.oc] 18 May 2018
Derivative-Free Optimization Algoritms based on Non-Commutative Maps * Jan Feiling,, Amelie Zeller, and Cristian Ebenbauer arxiv:805.0748v [mat.oc] 8 May 08 Institute for Systems Teory and Automatic Control,
More informationNONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL. Georgeta Budura
NONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL Georgeta Budura Politenica University of Timisoara, Faculty of Electronics and Telecommunications, Comm. Dep., georgeta.budura@etc.utt.ro Abstract:
More informationERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*
EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T
More information1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits.
Questions 1. State weter te function is an exponential growt or exponential decay, and describe its end beaviour using its. (a) f(x) = 3 2x (b) f(x) = 0.5 x (c) f(x) = e (d) f(x) = ( ) x 1 4 2. Matc te
More informationDerivation Of The Schwarzschild Radius Without General Relativity
Derivation Of Te Scwarzscild Radius Witout General Relativity In tis paper I present an alternative metod of deriving te Scwarzscild radius of a black ole. Te metod uses tree of te Planck units formulas:
More informationConsider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.
Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions
More informationEFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING
Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:
More informationMA455 Manifolds Solutions 1 May 2008
MA455 Manifolds Solutions 1 May 2008 1. (i) Given real numbers a < b, find a diffeomorpism (a, b) R. Solution: For example first map (a, b) to (0, π/2) and ten map (0, π/2) diffeomorpically to R using
More informationDifferentiation in higher dimensions
Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends
More informationMAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016
MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0
More information1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)
Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of
More information2.1 THE DEFINITION OF DERIVATIVE
2.1 Te Derivative Contemporary Calculus 2.1 THE DEFINITION OF DERIVATIVE 1 Te grapical idea of a slope of a tangent line is very useful, but for some uses we need a more algebraic definition of te derivative
More informationTime (hours) Morphine sulfate (mg)
Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15
More informationCubic Functions: Local Analysis
Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationLong Term Time Series Prediction with Multi-Input Multi-Output Local Learning
Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles
More informationHOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS
HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3
More informationNotes on Neural Networks
Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat
More information2.11 That s So Derivative
2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point
More information3.4 Worksheet: Proof of the Chain Rule NAME
Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are
More informationPhysically Based Modeling: Principles and Practice Implicit Methods for Differential Equations
Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff
More informationChapter 2 Limits and Continuity
4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(
More informationNew Distribution Theory for the Estimation of Structural Break Point in Mean
New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University
More informationTe comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab
To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk
More informationLearning based super-resolution land cover mapping
earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc
More informationOrder of Accuracy. ũ h u Ch p, (1)
Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical
More information[db]
Blind Source Separation based on Second-Order Statistics wit Asymptotically Optimal Weigting Arie Yeredor Department of EE-Systems, el-aviv University P.O.Box 3900, el-aviv 69978, Israel Abstract Blind
More informationContinuous Stochastic Processes
Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling
More informationA SHORT INTRODUCTION TO BANACH LATTICES AND
CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,
More informationVolume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households
Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of
More informationSin, Cos and All That
Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives
More informationPrecalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!
Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.
More informationTopics in Generalized Differentiation
Topics in Generalized Differentiation J. Marsall As Abstract Te course will be built around tree topics: ) Prove te almost everywere equivalence of te L p n-t symmetric quantum derivative and te L p Peano
More informationFast Explicit and Unconditionally Stable FDTD Method for Electromagnetic Analysis Jin Yan, Graduate Student Member, IEEE, and Dan Jiao, Fellow, IEEE
Tis article as been accepted for inclusion in a future issue of tis journal. Content is final as presented, wit te exception of pagination. IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES 1 Fast Explicit
More informationarxiv: v1 [math.pr] 28 Dec 2018
Approximating Sepp s constants for te Slepian process Jack Noonan a, Anatoly Zigljavsky a, a Scool of Matematics, Cardiff University, Cardiff, CF4 4AG, UK arxiv:8.0v [mat.pr] 8 Dec 08 Abstract Slepian
More information