Risk Bounds for Lévy Processes in the PAC-Learning Framework

Similar documents
Risk Bounds of Learning Processes for Lévy Processes

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Infinitely divisible distributions and the Lévy-Khintchine formula

ON ADDITIVE TIME-CHANGES OF FELLER PROCESSES. 1. Introduction

Rademacher Complexity Bounds for Non-I.I.D. Processes

Short-time expansions for close-to-the-money options under a Lévy jump model with stochastic volatility

Stochastic Optimization

The Lévy-Itô decomposition and the Lévy-Khintchine formula in31 themarch dual of 2014 a nuclear 1 space. / 20

The sample complexity of agnostic learning with deterministic labels

On the Concentration of Measure Phenomenon for Stable and Related Random Vectors

The Codimension of the Zeros of a Stable Process in Random Scenery

Shifting processes with cyclically exchangeable increments at random

Lévy Processes and Infinitely Divisible Measures in the Dual of afebruary Nuclear2017 Space 1 / 32

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

On Learnability, Complexity and Stability

Supermodular ordering of Poisson arrays

Lecture Characterization of Infinitely Divisible Distributions

Rademacher Bounds for Non-i.i.d. Processes

LIMIT THEOREMS FOR SHIFT SELFSIMILAR ADDITIVE RANDOM SEQUENCES

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Limit theorems for multipower variation in the presence of jumps

Packing-Dimension Profiles and Fractional Brownian Motion

A Result of Vapnik with Applications

COMS 4771 Introduction to Machine Learning. Nakul Verma

PACKING-DIMENSION PROFILES AND FRACTIONAL BROWNIAN MOTION

A generalization of Strassen s functional LIL

Set-Indexed Processes with Independent Increments

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Regular Variation and Extreme Events for Stochastic Processes

Poisson random measure: motivation

Stochastic Analysis. Prof. Dr. Andreas Eberle

Obstacle problems for nonlocal operators

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

On layered stable processes

An introduction to Lévy processes

Multi-Factor Lévy Models I: Symmetric alpha-stable (SαS) Lévy Processes

Jump-type Levy Processes

On Optimal Stopping Problems with Power Function of Lévy Processes

Definition: Lévy Process. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 2: Lévy Processes. Theorem

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

STOCHASTIC GEOMETRY BIOIMAGING

Exponential martingales: uniform integrability results and applications to point processes

Lévy Processes in Cones of Banach Spaces

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Introduction to self-similar growth-fragmentations

Time Series Prediction & Online Learning

Learning with Rejection

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Heavy Tailed Time Series with Extremal Independence

Online Learning for Time Series Prediction

On the Generalization Ability of Online Strongly Convex Programming Algorithms

Brownian Motion and Conditional Probability

Learning Bound for Parameter Transfer Learning

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Densities for Ornstein-Uhlenbeck processes with jumps

Learning Kernel-Based Halfspaces with the Zero-One Loss

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Lecture 4: Introduction to stochastic processes and stochastic calculus

Heavy tails of a Lévy process and its maximum over a random time interval

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

LOCATION OF THE PATH SUPREMUM FOR SELF-SIMILAR PROCESSES WITH STATIONARY INCREMENTS. Yi Shen

Hardy-Stein identity and Square functions

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

A Note on the Central Limit Theorem for a Class of Linear Systems 1

Introduction to Machine Learning (67577) Lecture 7

On the tradeoff between computational complexity and sample complexity in learning

Power Variation and Time Change

21.1 Lower bounds on minimax risk for functional estimation

Stochastic Processes. Winter Term Paolo Di Tella Technische Universität Dresden Institut für Stochastik

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

On Reflecting Brownian Motion with Drift

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

On maximal inequalities for stable stochastic integrals

Goodness of fit test for ergodic diffusion processes

Finite-time Ruin Probability of Renewal Model with Risky Investment and Subexponential Claims

ONE DIMENSIONAL MARGINALS OF OPERATOR STABLE LAWS AND THEIR DOMAINS OF ATTRACTION

Soo Hak Sung and Andrei I. Volodin

arxiv: v1 [math.pr] 20 May 2018

Generalized normal mean variance mixtures and subordinated Brownian motions in Finance. Elisa Luciano and Patrizia Semeraro

A NOTE ON STOCHASTIC INTEGRALS AS L 2 -CURVES

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

Asymptotics of the Finite-time Ruin Probability for the Sparre Andersen Risk. Model Perturbed by an Inflated Stationary Chi-process

Potentials of stable processes

Octavio Arizmendi, Ole E. Barndorff-Nielsen and Víctor Pérez-Abreu

Estimation of the characteristics of a Lévy process observed at arbitrary frequency

Bernstein-gamma functions and exponential functionals of Lévy processes

Characterization of dependence of multidimensional Lévy processes using Lévy copulas

The strictly 1/2-stable example

Generalization error bounds for classifiers trained with interdependent data

Large time behavior of reaction-diffusion equations with Bessel generators

Class 2 & 3 Overfitting & Regularization

A LOGARITHMIC EFFICIENT ESTIMATOR OF THE PROBABILITY OF RUIN WITH RECUPERATION FOR SPECTRALLY NEGATIVE LEVY RISK PROCESSES.

Computational Learning Theory - Hilary Term : Learning Real-valued Functions

Stable Lévy motion with values in the Skorokhod space: construction and approximation

Numerical Methods with Lévy Processes

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Probability for Statistics and Machine Learning

STOCHASTIC ANALYSIS FOR JUMP PROCESSES

Transcription:

Risk Bounds for Lévy Processes in the PAC-Learning Framework Chao Zhang School of Computer Engineering anyang Technological University Dacheng Tao School of Computer Engineering anyang Technological University Abstract Lévy processes play an important role in the stochastic process theory. However, since samples are non-i.i.d., statistical learning results based on the i.i.d. scenarios cannot be utilized to study the risk bounds for Lévy processes. In this paper, we present risk bounds for non-i.i.d. samples drawn from Lévy processes in the PAC-learning framework. In particular, by using a concentration inequality for infinitely divisible distributions, we first prove that the function of risk error is Lipschitz continuous with a high probability, then by using a specific concentration inequality for Lévy processes, we obtain the risk bounds for non-i.i.d. samples drawn from Lévy processes without Gaussian components. Based on the resulted risk bounds, we analyze the factors that affect the convergence of the risk bounds then prove the convergence. 1 Introduction Most existing results of statistical learning theory, e.g., probably approximately correct PAC learning framework Valiant, 1984, are based on some classical statistical results e.g., the central limit theorem, which are valid for the independent identically distributed i.i.d. samples. 1 One of the major concerns in statistical learning theory is the risk bound. Risk bounds of learning algorithms measure the probability that a function produced by an algorithm has a sufficiently small error. Vapnik 1999 1 ote that Kolmogorov s central limit theorem is valid for identically distributed samples Petrov, 1995. Appearing in Proceedings of the 13 th International Conference on Artificial Intelligence Statistics AISTATS 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: W&CP 9. Copyright 2010 by the authors. gave asymptotic risk bounds in terms of complexity measures of function classes such as VC dimensions. Various risk bounds can be obtained by using different concentration inequalities symmetrization inequalities. Bartlett et al. 2002 introduced local Rademacher complexities presented sharp risk bounds for a particular function class. Sridharan et al. 2008 showed that the empirical minimizer can converge to the optimal value at rate 1/ if the empirical minimization of a stochastic objective is λ-strongly convex the stochastic component is linear. Tsybakov 2004 presented a classifier that automatically adapts to the margin condition its error bound approaches to zero at a fast rate 1/. In a hypothesis with some noise conditions, the rates of error bounds are always faster than 1/ Bousquet, 2002. However, these results share an identical assumption that the samples are i.i.d., they are no longer valid for non-i.i.d. samples. In practical applications, the i.i.d. assumption for samples is not always valid. For example, some financial physical behaviors are temporally dependent thus the aforementioned research results are unsuitable. Mohri Rostamizadeh 2008 gave the Rademacher complexity-based risk bounds for stationary β-mixing sequences. They can be deemed as transitions between i.i.d. scenarios non-i.i.d. scenarios, the dependence between samples diminishes along time. Especially, by utilizing a technique of independent blocks Yu, 1994, the samples drawn from a β-mixing sequence can be transformed to an i.i.d. case some classical results for i.i.d. cases can be applied to obtain the risk bounds. Moreover, there are also some works about the uniform laws for dependent processes obel Dembo, 1993. In this paper, we focus on risk bounds for Lévy processes that are the stochastic processes with stationary independent increments. Definition 1.1 A stochastic process Z t t 0 on R d is a Lévy process if it satisfies the following conditions: 1. Z 0 = 0, a.s. 948

Risk Bounds for Lévy Processes 2. For any n 1 0 t 0 t 1 t n the rom variables are independent. Z t0, Z t1 Z t0,, Z tn Z tn 1 3. The increments are stationary, i.e., the distribution of Z s+t Z s is independent of s. 4. The process is right continuous, i.e., for any s t 0 ϵ > 0, we have [ ] lim P Z t Z s > ϵ = 0. s t Lévy processes play an important role in financial mathematics, physics, engineering actuarial science Barndorff-ielsen et al., 2001; Applebaum, 2004a. Lévy processes contain many important special examples, e.g., Brownian motion, Poisson processes, stable processes subordinators. Lévy processes have been regarded as prototypes of semimartingales Feller-Markov processes Applebaum, 2004b; Sato, 2004. Figueroa-López Houdré 2006 used projection estimators to estimate the Lévy density, then gave a bound to exhibit the discrepancy between a projection estimator the orthogonal projection by using the concentration inequalities for functionals of Poisson integrals. This paper is focused on the risk bounds for Lévy processes in the PAC-learning framework. The PAC-learning framework was proposed by Valiant 1984 intends to obtain, with high probability, a hypothesis that is a good approximation to an unknown target by successful learning. In this framework, a learner receives some samples then selects an appropriate function from a function class based on these samples. The selected function has a low risk error with a high probability. Given an input space X R I its corresponding output space Y R J, formally, we define Z = X, Y R I J assume that Z = {Z t } t 0 is a Lévy process without Gaussian components with Z t = x t, y t. We expect to find a function T : X Y that, given a new input x t t > 0, accurately predicts the output y t. Particularly, for a loss function l : Y 2 R, the target function T minimizes the expected risk, for any t > 0, E t lt x t, y t = lt x t, y t dp t, 1 P t sts for the distribution of Z t = x t, y t. Since P t is unknown, T usually cannot be directly obtained by using 1. Therefore, given a function class G a sample set {Z tn } Z with Z tn = x tn, y tn, the estimation to the target function T is determined by minimizing the following empirical risk E lgx, y = 1 lgx tn, y tn, g G, 2 which is considered as an approximation to the expected risk 1. ext, we define the loss function class F = {Z lgx, y : g G}. call F the function class in the rest of the paper. For any t > 0 a sample set {Z tn } Z t 1 < t 2 < < t, we shortly denote, for any f F, E t fz t = fz t dp t, E f = 1 fz tn, define the risk error function with respect to Z t as ΦZ t = sup E f E t fz t. Similar to the processes of obtaining risk bounds for i.i.d. samples, e.g., Bartlett et al., 2002, we use a specific concentration inequality to obtain risk bounds for the empirical processes of Lévy processes. Houdré Marchal 2008 discussed median, concentration fluctuations for Lévy processes gave a concentration inequality for Lévy processes without Gaussian components under the 1-Lipschitz continuity. According to the discussions in Houdré Marchal, 2008 Houdré, 2002, the concentration inequality also holds for the λ-lipschitz continuity λ > 0. By using the new concentration inequality, we achieve an upper bound of risk error ΦZ t under the PAClearning framework, in Z t is drawn from an unknown Lévy process Z without Gaussian components. Its characteristic exponent ψ 1 θ of Z 1 Z is given, for all θ R I J, by ψ Z θ =i < θ, a > + e i<θ,y> R I J 1 i < θ, y > 1 y 1 νdy, 3 a R I J ν 0 is a Lévy measure. Moreover, in 3, <, > are the Euclidean inner product a norm in R I J, respectively. Then, for any t > 0, we have E t expi < θ, Z t > = exp tψ 1 θ. 4 The above 4 implies the process Z is completely determined by the distribution of Z 1 a, 0, ν is the generating triplet of Z. In the next section, we provide details of 3 4. The rest of this paper is organized as follows. Section 2 introduces Lévy processes. The main results are presented in Section 3. Section 4 gives the proofs of some lemmas theorems Section 5 concludes the paper. 949

Zhang, Tao 2 Lévy Processes In this section, we briefly introduce Lévy processes. Details are given in Applebaum, 2004b; Sato, 2004. We first introduce the infinitely divisible distributions which are strongly related to Lévy processes. Afterward, we show that for a Lévy process {Z t } t 0, if Z t {Z t } t 0 t > 0, the distribution of Z t is infinitely divisible. Definition 2.1 A real-valued rom variable Z has an infinitely divisible distribution if for any = 1, 2,, there is a sequence of i.i.d. rom variables } such that Z Z 1 + + Z is equality in distribution. {Z n According to the definition, if a rom variable has an infinitely divisible distribution, it can be separated into the sum of arbitrary number of i.i.d. rom variables. ext, we introduce the characteristic exponent of an infinitely divisible distribution Sato, 2004. Before the statement, we need the following definition Applebaum, 2004b. Definition 2.2 Let ν be a Borel measure defined on R d \{0}. This ν will be a Lévy measure if min{ y 2, 1}νdy <, 5 R d \{0} ν{0} = 0. The Lévy measure describes the expected number of a certain height jump in a time interval of unit length 1. Then, the characteristic exponent of an infinitely divisible rom variable is shown in the following theorem. Theorem 2.3 Lévy-Khintchine A Borel probability measure µ of a rom variable Z R d is infinitely divisible if there exists a triple a, A, ν, a R d, a positive-definite symmetric d d matrix A a Lévy measure ν on R d \{0} such that, for all θ R d, the characteristic exponent ψ µ is of the form ψ µ θ =i < a, θ > 1 2 < θ, Aθ > + R d \{0} [ e i<θ,y> 1 i < θ, y > 1 y 1 ]νdy. 6 As stated above, an infinitely divisible distribution can be completely determined by a triple a, A, ν in a sts for the drift of a Brownian motion, A is a Gaussian component ν is a Lévy measure. ext, we show that for any Lévy process, there must be an infinitely divisible rom variable corresponding to it, vice versa. It is to be pointed out that for any Lévy process {Z t } t 0, the distributions of Z t t > 0 are all determined by the distribution of Z 1. The issue has been sketched in Applebaum, 2004b. Lemma 2.4 Let {Z t } t 0 be a Lévy process. i For any t > 0, Z t has an infinitely divisible distribution. ii For any t > 0 θ R d, let the characteristic exponent ψ t θ = loge t e iθt. 7 Then we have ψ t θ = tψ 1 θ, 8 ψ 1 t is the characteristic exponent of Z 1. The above lemma shows that any Lévy process has the property that for all t 0, E t e iθz t = e tψ 1θ, 9 ψ 1 t is the characteristic exponent of Z 1 that has an infinitely divisible distribution, i.e., any Lévy process corresponds to an infinitely divisible distribution. The next theorem shows Sato, 2004 that, given an infinitely divisible distribution, one can construct a Lévy process {Z t } t 0 such that Z 1 has that distribution. Theorem 2.5 Suppose that a R d, A is a positivedefinite symmetric d d matrix ν is a measure concentrated on R\{0} such that 1 y 2 νdy <. By the triple a, A, ν, for each θ R d, define ψθ =i < a, θ > 1 2 < θ, Aθ > + e i<θ,y> 1 i < θ, y > 1 y <1 νdy. 10 Then, there exists a Lévy process {Z t } t 0 Z 1 has the characteristic exponent in the form of 10. ote that according to Lemma 2.4 Theorem 2.5, a Lévy process {Z t } t 0 can be distinguished by the triple a, A, ν of the Z 1 s distribution. Thus, we call a, A, ν the generating triplet of {Z t } t 0. Furthermore, according to Lemma 2.4 Theorem 2.5, we can obtain the following result Sato, 2004, which is necessary for providing the main results. Lemma 2.6 Let a, A, ν be the generating triplet of a Lévy process {Z t } t 0. Then, at any time t > 0, Z t {Z t } t 0 has an infinitely divisible distribution with the triplet at, At, νt. This paper mainly concerns the Lévy process with the generating triplet a, 0, ν in the Gaussian component is zero, as shown in 3. 2 2 We refer to the reference Sato, 2004 for the detailed discussion about the effect of a triple a, A, ν to the path of the corresponding Lévy process. 950

Risk Bounds for Lévy Processes 3 Main Results Given a Lévy process {Z t } t 0 with the Lévy measure ν, we give the following definitions Marcus Rosiński, 2001. For any C > 0, define V C = y 2 νdy, 11 MC = y C y C y νdy. 12 ext, let ν be the tail of ν, i.e., νc = νdy. 13 y >C Moreover, given a function class F a Lévy process {Z t } t 0 with characteristic exponent 3, for any t 0, let ΦZ t = sup E f E t fz t, 14 let f t be the solution to the supremum of ΦZ t over the function class F. Then, let F = {f t } t 0 be the set of all f t t 0. In this paper, we need the following mild assumptions: A1 The Lévy process Z = Z t t 0 has a finite expectation is centered, i.e., for any t 0, E t Z t = 0. 15 A2 There exists a constant K such that for every C > 0, we have MC K V C C. 16 A3 There exists a constant A > 0 such that, for any C > 0, we have νc A V C C 2. 17 A4 For any s, t > 0 s t, there exists a positive constant α such that f t Z t f s Z s α Z t Z s, 18 Z s, Z t Z f s, f t F. A5 There exists a positive constant β such that for any f F Z, Z R I J, we have fz fz β Z Z. 19 Remark: i If Z has non-centered finite mean, one can consider the Lévy process {Z t EZ t } t 0, the assumption A1 holds. ii The assumption A5 implies that elements of F satisfy the β-lipschitz continuity. Subsequently, we begin the main discussion of this paper. First, we prove that the function ΦZ t cf. 14, with a high probability, is Lipschitz continuous with respect to Z t. Afterward, according to a concentration inequality for Lévy process Houdré Marchal, 2008, we obtain the risk bound of ΦZ t in the PAClearning framework. Then, we analyze the terms in the resulting bound prove the convergence. 3.1 Risk Bounds Under the above assumptions, we first consider the Lipschitz continuity of the function ΦZ t. Lemma 3.1 Given a Lévy process Z = {Z t } t 0 with characteristic exponent 3, {Z tn } is an ordered sample set drawn from Z. If assumptions A4 A5 are valid, then we have that, for any η > 0 Z s, Z t Z s t, with the probability at least 1 δ t δ s, ΦZ t ΦZ s 2α + 6η Z t Z s, 20 { η Zt Z s δ t = exp log { δ s = exp η Z t Z s log η Zt Z s 1 + η Z t Z s S tβv η Zt Z s + 1 η Z t Z s S sβv with S = inf{ρ > 0 : νy : y > ρ = 0}. + tv S 2 }, 21 sv S 2 } 22 The above lemma shows that for a given sample set {Z tn }, ΦZ t is Lipschitz continuous with a high probability. For some positive real number c, let { h c t = inf C > 0 : V C C 2 = c }, t > 0. 23 t Then, we can arrive at the main theorem of this paper. 951

Zhang, Tao Theorem 3.2 Given a function class F a sample set {Z tn } drawn from a Lévy process with characteristic exponent 3. If the assumptions A1 A5 are valid there exists a constant λ such that 2α + 6η λ, then with probability at least δ L δ, the following bound holds ΦZ t b + λckh c t + β δ = Ac + exp E t tn Z t tn, 24 { b λ b + c log 1 + b }, 25 λ λc δ L is the probability that makes Lemma 3.1 hold. The above theorem gives a bound of ΦZ t. In Lemma 3.1, we have proven that the Lipschitz continuity of ΦZ t holds with a high probability. Thus, Theorem 3.2 is valid with a high probability. 3.2 Convergence Analysis We analyze the terms at the right h side of 29 discuss some factors that affect the convergence. The following lemma gives a bound to the fluctuations of Z t t > 0 Houdré Marchal, 2008. Lemma 3.3 Let C 0 t t > 0 be the solution in C of the equation: V C C 2 If the assumption A1 is valid, then + MC C = 1 t. 26 1 4 C 0t E t Z t 17 8 C 0t, 27 the factor 17/8 can be replaced by 5/4 when {Z t } t 0 is symmetric. Furthermore, if assumptions A1 A2 are both valid, then we have h 1 1+K t C 0t h 1 t. 28 The above lemma shows that if assumptions A1 A2 are valid, we can use h c t to bound E Z t. Therefore, by combining Theorem 3.2 Lemma 3.3, we have the following result. Corollary 3.4 Given a function class F a sample set {Z tn } drawn from a Lévy process with characteristic exponent 3. If assumptions A1 A5 are valid there exists a constant λ such that 2α + 6η λ, then with probability at least δ L δ, the following bound holds ΦZ t b + λckh c t + 17β 8 δ = Ac + exp h 1 t t n, 29 { b λ b + c log 1 + b }, 30 λ λc δ L is the probability that makes Lemma 3.1 hold, the factor 17/8 can be replaced by 5/4 when {Z t } t 0 is symmetric. The terms at the right h side of 29 affect the convergence of the risk bound with respect to the number of samples. ext, we present a convergence theorem about the risk bound for Lévy processes. Theorem 3.5 Follow the notations conditions of Theorem 3.2 Corollary 3.4, assume that 0 < c << 1/A, c = o1 0 < b = o c as, in o sts for the infinitesimal of higher order. If h 1 t t n = o there exists a constant C ν such that y 2 νdy < C ν, 31 R I J then we have that for any given t > 0, with probability at least δ L δ, lim ΦZ t = 0. 32 0 On the one h, from the proof of Theorem 3.5, if there exists a positive constant C ν satisfying 31, for any t > 0, the rate of b + λckh c t 0 is completely determined by the selection of b, c. On the other h, according to 30, the probability δ can be deemed as a function of b c, thus the values of b c must satisfy the constraint δ < 1. According to 30, it is clear that if b, c > 0 c << 1/A, the constraint δ < 1 holds. Therefore, in Theorem 3.5, b + λckh c t can reach zero at an arbitrary rate the convergence is mainly affected by the term 1/ h 1 t t n. Recalling the definition of h 1 t cf. 23, we have that for a given t > 0, the convergence of ΦZ t is determined by the Lévy measure ν. According to the definition of Lévy measure ν cf. 5, if many large jumps frequently appear in the path of a Lévy process, the term 1/ h 1 t t n may be infinite when approaches to infinity thus the convergence is not valid. 952

Risk Bounds for Lévy Processes 4 Proofs of Lemmas Theorems In this section, we prove Lemma 3.1, Theorem 3.2 Theorem 3.5, respectively. 4.1 Proof of Lemma 3.1 To prove this lemma, it is necessary to have the concentration inequality proposed by Houdré 2002. Lemma 4.1 Let ν have bounded support with S = inf{ρ > 0 : ν{y : y > ρ} = 0}, let V = y 2 νdy. If f is a λ-lipschitz function, then for any x 0, we have R d P fz E Z fz > x x x exp λs λs + V S 2 log 1 + Sx. 33 λv Lemma 4.2 Given a Lévy process with a generating triple a, 0, ν, let ν have bounded support with S = inf{ρ > 0 : ν{y : y > ρ} = 0}, let V = R d y 2 νdy. If assumptions A4 A5 are valid, then we have that with probability at least 1 δ t δ s, for any real numbers s, t > 0 s t any η > 0, E t f t Z t E t f s Z s α + 2η Z t Z s, 34 for any Z, Z Z, we have that f t Z f s Z α + 4η Z t Z s, 35 { η Zt Z s δ t = exp δ s = exp log η Zt Z s }, 1 + η Z t Z s S tβv { η Z t Z s log Proof of Lemma 4.2. η Zt Z s + }. 1 η Z t Z s S sβv + tv S 2 sv S 2 According to Lemma 2.4 Lemma 2.6, Z t Z s have infinitely divisible distributions with the triples at, 0, νt as, 0, νs, respectively. According to Lemma 4.1, for any x 0, we have δ x t δ x s P f t Z t E t f t Z t > x δ x t, 36 P f s Z s E s f s Z s < x δ s x, 37 { x x = exp + tv S 2 log 1 + xs }, tβv 38 { = exp x x + sv S 2 log 1 xs sβv }. 39 Then, we have that, with probability at least 1 δ x t, f t Z t E t f t Z t x, 40 with probability at least 1 δ x s, f s Z s E s f s Z s x, 41 According to 40 41, we have that with probability at least 1 δ x t δ s x, f t Z t f s Z s E t f t Z t E s f s Z s 2x. 42 According to the assumption A4 42, we have that with the probability at least 1 δ x t δ s x, E t f t Z t E s f s Z s f t Z t f s Z s + 2x α Z t Z s + 2x. 43 Recall 42 note that since Z t Z s are the symbolic variables, it is valid to replace Z t resp. Z s with Z resp. Z. Thus, according to 42, we have that with the probability at least 1 δ x t δ s x, f t Z f s Z E t f t Z E s f s Z + 2x α Z t Z s + 4x. 44 Because x is an arbitrary nonnegative real number, we let x = η Z t Z s, η > 0 is an arbitrary real number. Then, by substituting x = η Z t Z s into 38, 39, 43 44, we complete the proof. Based on Lemma 4.2, we can arrive at the proof of Lemma 3.1. Proof of Lemma 3.1. 953

Zhang, Tao First, according to Lemma 4.2, we have that for any f s, f t F, with probability at least 1 δ t δ s, E f t E f s = 1 1 f t Z tn 1 f s Z tn f t Z tn f s Z tn α + 4η Z t Z s, 45 δ t δ s are given in 21 22, respectively. Thus, for any Z t Z s, according to 14, 34 45, we have that with probability at least 1 δ t δ s, ΦZ t ΦZ s = sup E f E t fz t sup E f E s fz s = E f t E t f t Z t E f s E s f s Z s E f t E t f t Z t E f s + E s f s Z s E t f t Z t E s f s Z s + E f t E f s 2α + 6η Z t Z s. 46 This completes the proof. 4.2 Proof of Theorem 3.2 To prove Theorem 3.2, it is necessary to introduce a concentration inequality proposed by Houdré Marchal 2008. Lemma 4.3 Suppose that Z = {Z t } t 0 is a Lévy process with characteristic exponent 3 f is a λ- Lipschitz function. If assumptions A1 A2 are valid, then for any b > 0 any c, t > 0 such that C = h c t supports the assumption A3, we have P fz t E t fz t b + λckh c t { b Ac + exp λ b + c log 1 + b λ λc }. 47 Based on the above concentration inequality, we prove the main theorem as follows. Proof of Theorem 3.2. We only consider the bound of E t1 t ΦZ t, then the results can be obtained by Lemma 3.1 Lemma 4.3. According to the assumption A5, we have E t1 t ΦZ t =E t1 t sup E t fz t 1 fz tn Et =E t1 t f t Z t 1 f t Z tn 1 E t 1 t β E t 1 t = β ft E t Z t f tn Z tn E t Zt Z tn Zt. E t,tn Z tn 48 Since Lévy processes have stationary independent increments, according to 48, we have that E t1 t ΦZ t β This completes the proof. 4.3 Proof of Theorem 3.5 Proof of Theorem 3.5. E t tn Z t tn. 49 According to 23, for any c, t > 0, we have that tv hc t h c t =. 50 c According to 11 31, we have that tv hc t tcν bh c t = b b c c. Since b = o c as, we have that bh c t = o1 as. Similarly, according to 11, 31 50, we have that tv hc t λckh c t = λck λk tc ν c. 51 c Since λ K are given constants, λckh c t = O c as, in O sts for the infinitesimal of the same order, thus we have λckh c t o1 because c = o1. Finally, by using the condition that h 1 t t n = o, we complete the proof. 5 Conclusion We present the risk bounds for Lévy processes without Gaussian components in the PAC-learning framework. 954

Risk Bounds for Lévy Processes We first briefly introduce Lévy processes discuss the relationship between infinitely divisible distributions Lévy processes, which is necessary for the proof of the main result. Concentration inequalities are the main tools to achieve risk bounds for some empirical processes. Houdré Marchal 2008 proposed a concentration inequality for Lévy processes without Gaussian components. However, this concentration inequality needs the condition that the function is 1-Lipschitz continuous. Thus, we first generalize it to the λ-lipschitz λ > 0 continuous version according to the discussion in Houdré, 2002 Houdré Marchal, 2008. Then, by using a concentration inequality for infinitely divisible distributions Houdré, 2002, we prove that, with a high probability, the function ΦZ t is Lipschitz continuous with respect to Z t. Therefore, based on these results, we achieve the risk bounds for the Lévy processes without Gaussian components. By using the bound of fluctuations for Lévy processes Houdré Marchal, 2008, we use the resulted risk bound to obtain a convergence theorem which shows that the convergence of the risk bound is mainly determined by the Lévy measure ν. That implies that, if many large jumps appear frequently in the path of a Lévy process, the risk bound for the process may not converge to zero when the sample number approaches to infinity. In our future work, we will attempt to study risk bounds for other stochastic processes via concentration inequalities, e.g., stochastic processes with exchangeable increments that are a well-known generalization of stochastic processes with independent increments Kallenberg, 1973; Kallenberg, 1975. Acknowledgements The project was fully supported by the anyang Technological University SUG Grant under project number M58020010. References L. G. Valiant 1984. A theory of the learnable. ew York: ACM Press. V. V. Petrov 1995. Limit Theorems of Probability Theory: Sequences of Independent Rom Variables. USA: Oxford University Press. V.. Vapnik 1999. The ature of Statistical Learning Theory Information Science Statistics. ew York: Springer. P. L. Bartlett, O. Bousquet S. Mendelson 2002. Localized Rademacher Complexities, In COLT 02: Proceedings of the 15th Annual Conference on Computational Learning Theory, 44-58. K. Sridharan,. Srebro S. Shalev-Shwartz 2008. Fast Rates for Regularized Objectives, In IPS 08: eural Information Processing Systems. A. Tsybakov 2004. Optimal Aggregation of Classifiers in Statistical Learning. Annals of Statistics 32:135-166. O. Bousquet 2002. Concentration Inequalities Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, Ecole Polytechnique. M. Mohri A. Rostamizadeh 2008. Rademacher complexity bounds for non-i.i.d. processes. In IPS 08: eural Information Processing Systems, 1097-1104, Vancouver, Canada: MIT Press. B. Yu 1994. Rates of convergence for empirical processes of stationary mixing sequences. Annals of Probability, 221:94-116. A. B. obel A. Dembo 1993. A note on uniform laws of averages for dependent processes. Statistics Probability Letters, 17:169-172. O. E. Barndorff-ielsen, T. Mikosch S. I. Resnick 2001. Lévy Processes: Theory Applications. Birkhauser. D. Applebaum 2004a. Lévy processes - from probability theory to finance quantum groups. otices of the American Mathematical Society, 5111:1336-1347. D. Applebaum 2004b. Lévy processes Stochastic Calculus. Cambridge: Cambridge Press. K. Sato 2004. Lévy processes infinitely divisible distributions. USA: Cambridge University Press. J. E. Figueroa-López C. Houdré 2006. Risk bounds for the non-parametric estimation of Lévy processes. IMS Lecture otes-monograph Series High Dimensional Probability, 51:96-116. L. G. Valiant 1984. A Theory of the Learnable. Commun. ACM, 2711:1134-1142. C. Houdré P. Marchal 2008. Median, concentration fluctuations for Lévy processes. Stochastic Processes their Applications, 1185:852-863. C. Houdré 2002. Remarks on deviation inequalities for functions of infinitely divisible rom vectors. Annals of Probability, 303:1223-1237. M. B. Marcus J. Rosinski 2001. L 1 -orm of Infinitely Divisible Rom Vectors Certain Stochastic Integrals. Electronic Communications in Probability, 6:15-29. O. Kallenberg 1973. Canonical representations convergence criteria for processes with interchangeable increments. Z. Wahrsch. Verw. Gebiete, 27:23-36. O. Kallenberg 1975. On symmetrically distributed rom measures. Trans. Amer. Math. Soc., 202:105-121. 955