Learning Patterns for Detection with Multiscale Scan Statistics

Size: px
Start display at page:

Download "Learning Patterns for Detection with Multiscale Scan Statistics"

Transcription

1 Learning Patterns for Detection with Multiscale Scan Statistics J. Sharpnack 1 1 Statistics Department UC Davis UC Davis Statistics Seminar 2018 Work supported by NSF DMS

2 Hyperspectral Gas Detection

3 Hyperspectral Gas Detection Each pixel of the image is a light spectrum, and we are interested in detecting a single chemical signature. Images from [Manolakis et al. 2014, Manolakis & Shaw 2002]

4 Hyperspectral Gas Detection Gas signature level in each pixel. Dataset and chemical signature comparison code from Dimitris G. Manolakis and DTRA.

5 Hyperspectral Gas Detection Decreased SNR by 1 5th. Can you see it?

6 Hyperspectral Gas Detection If classification answers the question, what am I seeing? detection answers the question, do I see anything at all?

7 Anomaly detection goals Goal 1: Reliably detect anomalous patterns within images beyond what the human eye can see at the precise information theoretic limit.

8 Detection applications Contaminant detection in water networks, real-time surveillance system, radiation monitoring, fire detection and other remote sensing applications, medical imaging and automated radiology, early detection of pathogen outbreaks.

9 Rectangular multiscale scan statistic Scan every rectangle (vary location and scale) looking for abnormal concentrations [S, Arias-Castro 16]. Figure: A rectangle over the active region.

10 Scan statistic Represent image over [ L, L] [ L, L] as matrix Y and consider scanning rectangle over pixels [ H 0, H 0 ] [ H 1, H 1 ]. Define the pattern to be P k,l = 1 (2H0 + 1) (2H 1 + 1) then the scan is the following convolution, (P Y ) k,l = H 0 k = H 0 H 1 l = H 1 Y k k,l l P k,l, where defined. The single-scale scan statistic is ŝ = max (P Y ) k,l. k,l

11 Other patterns General pattern over the domain Ω = [ L, L] d can be hidden in the noisy tensor. Figure: A simulated time series with an embedded sinusoidal signal with values on the y-axis (d = 1).

12 Multiscale scan Scanning with many pattern dimensions, H 0,..., H d, is called a multiscale scan. How do we compare scan statistics at different scales?

13 Anomaly detection goals Goal 2: General purpose analysis for multiscale scan statistics for a large class of patterns.

14 Multiple Tensors Figure: Multiple tensors, i = 1,..., n (n = 5), with different locations and scales, and two possible patterns (left and right).

15 Anomaly detection goals Goal 3: Leverage database of tensors to simulaneously learn and detect anomalous patterns.

16 Prior work [Naus 65] scan statistics introduced for point cloud data [Siegmund, Worsley 95] limit distribution of 1-dimensional scan [Glaz and Zhang 04, Kabluchko 11] limit in d-dimensions [Arias-Castro et al. 05, 11] scan for blob-like patterns [Dumbgen, Spokoiny, 01] scale adaptive scan statistic (d = 1) [S, Arias-Castro 16] scale adaptive rectangular scan [Proksch at al. 17] scale adaptive smooth patterns This work: learning and detecting general smooth patterns in a database of tensors with scale adaptive methods

17 Continuous model Pattern f F C 1 over [ 1, 1] d, f L2 = 1. Data is random measure dx i with domain [ L, L] d. Scale dilation f h := h 1/2 f (./h), h = j h j, h R d Null hypothesis: data is just noise (dw i is d-dimensional Wiener process) Alternative hypothesis: there is a signal f at location t i, and scale h i. H 0 : dx i (τ) = dw i (τ), i = 1,..., n H 1 : dx i (τ) = µf h i (t i τ)dτ + dw i (τ) for some f F, and (t i, h i ) D, i = 1,..., n.

18 Continuous multiscale scan statistic Convolution at scale h, (f h dx i )(t) = f h (τ)dx i (t τ) = 1 h f (τ)dx i (t hτ), Scale corrected multiscale scan statistic: s(x i ; f ) := max h H v h h H := j [1, L) t T h := j [ (L h j ), L h j ] v h = 2 j log(l/h j) ( max t T h (f h dx i )(t) v h ). (1) Test if the pattern f centered at t and scaled by h is hidden within tensor X i.

19 Learning patterns Given a dataset of images X i, i = 1,..., n then can we also learn the pattern f F? S n (X ; F) := max f F 1 n n s(x i ; f ) i=1 (PAMSS) The pattern adapted multiscale scan statistic (PAMSS) averages the MSS for each tensor. Smoothness conditions on F are required: bounded variation (TVC) or average Hölder condition (AHC).

20 Smoothness assumptions Assumption TVC: Define the isotropic total variation, f TV := f (u) 2 du, function are of bounded variation, Ω γ 1 > 0 s.t. f F, f TV γ 1. (TVC) or Assumption AHC: Define the Hölder functional, A t,s (f ) := f (t z) f (s z) 2 dz. Ω L functions have bounded average Hölder condition, 0 < γ 2 1 s.t. f F, A t,s (f ) c A t s 2γ 2 2. (AHC)

21 Type 1 error control We can simulate from the null distribution to obtain a significance level for ( ) s(x i ; f ) := max v h max(f h dx i )(t) v h. (2) h H t T h [Dumbgen & Spokoiny 01] showed that under H 0 for L large enough s(x i, f ) = O P (log log L) for functions satisfying (TVC) in 1D. We need a more precise control to analyze PAMSS (tail bound) s(x i, f ) is subexponential random variable.

22 SubGaussian process Definition We say that a random field, {Z(ι)} ι I, is a (zero mean) standard subgaussian process if there exists a constant u 0 > 0 such that P { Z(ι 0 ) Z(ι 1 ) u} 2 exp P {Z(ι 0 ) u} exp ( u2 2ν(ι 0, ι 1 ) ), (3) ) ( u2, (4) 2 for any ι 0, ι 1 I, u > u 0, and ν(ι 0, ι 1 ) = E(Z(ι 0 ) Z(ι 1 )) 2, is the canonical distance. Under H 0, {(f h dx i )(t) : (t, h) D} is a subgaussian random field with canonical distance ν f ((h 0, t 0 ), (h 1, t 1 )) := f h0 (t 0.) f h1 (t 1.) L2.

23 Dudley s chaining Define the covering number of metric space (I, ν), N (I, ν, ɛ), to be the number of balls of ν-radius ɛ that is required to cover I. Theorem (Dudley s entropy (tail) bound) { } P sup Z(η) > u c D + C η I Ce u2 2, such that D = 0 2 log N (I, ν, ɛ)dɛ for universal constants c, C. Specifically, if N (I, ν, ɛ) Γɛ ρ. then (as Γ ) D = 2 log Γ + o(1).

24 Dudley s chaining Find a sequence (the chain) {η k } k=0 such that lim k η k = η, sup Z(η) = sup Z(η 0 ) + (Z(η 1 ) Z(η 0 )) + (Z(η 2 ) Z(η 1 )) +... η I η 0,η 1,...

25 Dudley s chaining Find a sequence (the chain) {η k } k=0 such that lim k η k = η, sup Z(η) = sup Z(η 0 ) + (Z(η 1 ) Z(η 0 )) + (Z(η 2 ) Z(η 1 )) +... η I η 0,η 1,... Z(η i+1 ) Z(η i ) is prop. to ν(η i+1, η i ) Choose covering layers such that N grows like 2 2k Variance of the bound is dominated by Z(η 0 )

26 Chaining standardized suprema Theorem Let Z(η) be a standard subgaussian process over an index set I. Suppose that metric (I, d Z ) has covering number, N s.t., N (I, d Z, ɛ) Γɛ ρ. (5) Then there exists an Γ 0 > 0 such that for any Γ Γ 0, the following supremum is bounded in probability, ( ) } { c0 P log Γ sup Z(η) 2 log Γ a 0 log log Γ > u e u, η I (6) for u > u 0 where u 0, c 0, a 0 are constant depending on ρ (but not on Γ). In words, the supremum of such a subgaussian process is subexponential with location and rate parameter, (2 log Γ) 1/2 (omitting the log log term).

27 Proof sketch for chaining bound For iid normals, {z i } N i=1, from union bound { P max z i > } 2 log N + u 2 e u2 2. i Generic chaining: 2 log N + u 2 u + 2 log N/(2u) Our chaining: 2 log N + u 2 2 log N + u 2 /(2 2 log N) { P 2 ( 2 log N max z i ) 2 log N i Modify chain so that } > u e u. (1) start the chain at a deeper level (N large enough) (2) make the covers grow slowly N a ak for a 1.

28 Main Theorem Theorem Let F be finite and assume that either all functions in F satisfy either (TVC) or (AHC). Let K log F n (δ) := ( F δ K n log ( F δ then for some constant K, under H 0, (7) ) ), log F n K + log δ, log F > n K + log δ P {S n (X, F) > F n (δ) log log L} δ. (8)

29 Proof of main theorem Lemma Then, under the above conditions, there is a constant C depending on d alone such that 1. Suppose that (TVC) holds for the class F, then ν f ((t, h), (t, h )) 2 Cγ 1 t t 2 ( ) 2 h h + 1. h 2. [Proksch et al. 16] Suppose that (AHC) holds for the class F, then ν f ((t, h), (t, h )) 2 t j t 2γ 2 j h C + j h 2γ 2 j h j 2γ 2 h j h j. 2 2γ 2

30 Proof of main theorem Lemma Suppose that f F satisfies either (TVC) or (AHC). Let l {0,..., log 2 L } d, and H 2 (l) = j [2 l j, 2 l j +1 ]. Then { } ( P c 1 max v h (fh dx i ) )(t) v h a1 log log L > u e u h H l,t T h for constants a 1, c 1 > 0 depending on γ, d only. With the union bound over l, { P c 2 sn(x i }, f ) log log L a 2 > u e u, then use subexponential Bernstein inequality.

31 Asymptotic Distinguishability Corollary Suppose that log F = o(n), and recall that under the alternative hypothesis, H 1, X i has an embedded pattern f at scale h i and v 2 = h i j log(l/hi j ), and the noise is a standard Wiener process. Suppose also that hj i Lc for some 0 c < 1 for all i, j, then the PAMSS is asymptotically powerful (has diminishing probability of type 1 and type 2 error) if µ 2 n i=1 v 2 h i n i=1 v h i. (9) We take this result to mean that as long as the function class, F, does not grow exponentially in n, we achieve asymptotic power under the same conditions as if F = 1.

32 Summary Proposed the Pattern Adapted Multiscale Scan Statistic for learning anomalous patterns We proved a refined concentration result for the supremum of subgaussian processes We controlled the error probabilities for the PAMSS showing that it can learn the function from a class for free as long as n log F For a link to this paper: Thanks!

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor RANDOM FIELDS AND GEOMETRY from the book of the same name by Robert Adler and Jonathan Taylor IE&M, Technion, Israel, Statistics, Stanford, US. ie.technion.ac.il/adler.phtml www-stat.stanford.edu/ jtaylor

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Poisson Approximation for Two Scan Statistics with Rates of Convergence

Poisson Approximation for Two Scan Statistics with Rates of Convergence Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work with David Siegmund) National University of Singapore May 28, 2015 Outline The first scan statistic The second

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Geometric inference for measures based on distance functions The DTM-signature for a geometric comparison of metric-measure spaces from samples

Geometric inference for measures based on distance functions The DTM-signature for a geometric comparison of metric-measure spaces from samples Distance to Geometric inference for measures based on distance functions The - for a geometric comparison of metric-measure spaces from samples the Ohio State University wan.252@osu.edu 1/31 Geometric

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0) Mollifiers and Smooth Functions We say a function f from C is C (or simply smooth) if all its derivatives to every order exist at every point of. For f : C, we say f is C if all partial derivatives to

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Gaussian Random Fields: Excursion Probabilities

Gaussian Random Fields: Excursion Probabilities Gaussian Random Fields: Excursion Probabilities Yimin Xiao Michigan State University Lecture 5 Excursion Probabilities 1 Some classical results on excursion probabilities A large deviation result Upper

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Consistency of Modularity Clustering on Random Geometric Graphs

Consistency of Modularity Clustering on Random Geometric Graphs Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Jonathan Scarlett jonathan.scarlett@epfl.ch Laboratory for Information and Inference Systems (LIONS)

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0).

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0). Math 27C, Spring 26 Final Exam Solutions. Define f : R 2 R 2 and g : R 2 R 2 by f(x, x 2 (sin x 2 x, e x x 2, g(y, y 2 ( y y 2, y 2 + y2 2. Use the chain rule to compute the matrix of (g f (,. By the chain

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Testing Algebraic Hypotheses

Testing Algebraic Hypotheses Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:

More information

Integral representations in models with long memory

Integral representations in models with long memory Integral representations in models with long memory Georgiy Shevchenko, Yuliya Mishura, Esko Valkeila, Lauri Viitasaari, Taras Shalaiko Taras Shevchenko National University of Kyiv 29 September 215, Ulm

More information

Lecture No 1 Introduction to Diffusion equations The heat equat

Lecture No 1 Introduction to Diffusion equations The heat equat Lecture No 1 Introduction to Diffusion equations The heat equation Columbia University IAS summer program June, 2009 Outline of the lectures We will discuss some basic models of diffusion equations and

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Notes on Distributions

Notes on Distributions Notes on Distributions Functional Analysis 1 Locally Convex Spaces Definition 1. A vector space (over R or C) is said to be a topological vector space (TVS) if it is a Hausdorff topological space and the

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

LECTURE 2: LOCAL TIME FOR BROWNIAN MOTION

LECTURE 2: LOCAL TIME FOR BROWNIAN MOTION LECTURE 2: LOCAL TIME FOR BROWNIAN MOTION We will define local time for one-dimensional Brownian motion, and deduce some of its properties. We will then use the generalized Ray-Knight theorem proved in

More information

sample lectures: Compressed Sensing, Random Sampling I & II

sample lectures: Compressed Sensing, Random Sampling I & II P. Jung, CommIT, TU-Berlin]: sample lectures Compressed Sensing / Random Sampling I & II 1/68 sample lectures: Compressed Sensing, Random Sampling I & II Peter Jung Communications and Information Theory

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Tail inequalities for additive functionals and empirical processes of. Markov chains

Tail inequalities for additive functionals and empirical processes of. Markov chains Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )

More information

DECOUPLING LECTURE 6

DECOUPLING LECTURE 6 18.118 DECOUPLING LECTURE 6 INSTRUCTOR: LARRY GUTH TRANSCRIBED BY DONGHAO WANG We begin by recalling basic settings of multi-linear restriction problem. Suppose Σ i,, Σ n are some C 2 hyper-surfaces in

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Empirical Processes and random projections

Empirical Processes and random projections Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Chapter 9: Basic of Hypercontractivity

Chapter 9: Basic of Hypercontractivity Analysis of Boolean Functions Prof. Ryan O Donnell Chapter 9: Basic of Hypercontractivity Date: May 6, 2017 Student: Chi-Ning Chou Index Problem Progress 1 Exercise 9.3 (Tightness of Bonami Lemma) 2/2

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Sobolev Spaces. Chapter 10

Sobolev Spaces. Chapter 10 Chapter 1 Sobolev Spaces We now define spaces H 1,p (R n ), known as Sobolev spaces. For u to belong to H 1,p (R n ), we require that u L p (R n ) and that u have weak derivatives of first order in L p

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Risk Bounds for Lévy Processes in the PAC-Learning Framework

Risk Bounds for Lévy Processes in the PAC-Learning Framework Risk Bounds for Lévy Processes in the PAC-Learning Framework Chao Zhang School of Computer Engineering anyang Technological University Dacheng Tao School of Computer Engineering anyang Technological University

More information

Asymptotic results for empirical measures of weighted sums of independent random variables

Asymptotic results for empirical measures of weighted sums of independent random variables Asymptotic results for empirical measures of weighted sums of independent random variables B. Bercu and W. Bryc University Bordeaux 1, France Workshop on Limit Theorems, University Paris 1 Paris, January

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Combinatorial Variants of Lebesgue s Density Theorem

Combinatorial Variants of Lebesgue s Density Theorem Combinatorial Variants of Lebesgue s Density Theorem Philipp Schlicht joint with David Schrittesser, Sandra Uhlenbrock and Thilo Weinert September 6, 2017 Philipp Schlicht Variants of Lebesgue s Density

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Hölder continuity of the solution to the 3-dimensional stochastic wave equation

Hölder continuity of the solution to the 3-dimensional stochastic wave equation Hölder continuity of the solution to the 3-dimensional stochastic wave equation (joint work with Yaozhong Hu and Jingyu Huang) Department of Mathematics Kansas University CBMS Conference: Analysis of Stochastic

More information

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3)

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3) M ath 5 2 7 Fall 2 0 0 9 L ecture 4 ( S ep. 6, 2 0 0 9 ) Properties and Estimates of Laplace s and Poisson s Equations In our last lecture we derived the formulas for the solutions of Poisson s equation

More information

Gaussian Random Fields: Geometric Properties and Extremes

Gaussian Random Fields: Geometric Properties and Extremes Gaussian Random Fields: Geometric Properties and Extremes Yimin Xiao Michigan State University Outline Lecture 1: Gaussian random fields and their regularity Lecture 2: Hausdorff dimension results and

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

MARKOV PARTITIONS FOR HYPERBOLIC SETS

MARKOV PARTITIONS FOR HYPERBOLIC SETS MARKOV PARTITIONS FOR HYPERBOLIC SETS TODD FISHER, HIMAL RATHNAKUMARA Abstract. We show that if f is a diffeomorphism of a manifold to itself, Λ is a mixing (or transitive) hyperbolic set, and V is a neighborhood

More information

1 Lesson 1: Brunn Minkowski Inequality

1 Lesson 1: Brunn Minkowski Inequality 1 Lesson 1: Brunn Minkowski Inequality A set A R n is called convex if (1 λ)x + λy A for any x, y A and any λ [0, 1]. The Minkowski sum of two sets A, B R n is defined by A + B := {a + b : a A, b B}. One

More information

212a1416 Wiener measure.

212a1416 Wiener measure. 212a1416 Wiener measure. November 11, 2014 1 Wiener measure. A refined version of the Riesz representation theorem for measures The Big Path Space. The heat equation. Paths are continuous with probability

More information

Anomalous transport of particles in Plasma physics

Anomalous transport of particles in Plasma physics Anomalous transport of particles in Plasma physics L. Cesbron a, A. Mellet b,1, K. Trivisa b, a École Normale Supérieure de Cachan Campus de Ker Lann 35170 Bruz rance. b Department of Mathematics, University

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Entropy, Inference, and Channel Coding

Entropy, Inference, and Channel Coding Entropy, Inference, and Channel Coding Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory NSF support: ECS 02-17836, ITR 00-85929

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Asymptotic results for empirical measures of weighted sums of independent random variables

Asymptotic results for empirical measures of weighted sums of independent random variables Asymptotic results for empirical measures of weighted sums of independent random variables B. Bercu and W. Bryc University Bordeaux 1, France Seminario di Probabilità e Statistica Matematica Sapienza Università

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Rate-Optimal Detection of Very Short Signal Segments

Rate-Optimal Detection of Very Short Signal Segments Rate-Optimal Detection of Very Short Signal Segments T. Tony Cai and Ming Yuan University of Pennsylvania and University of Wisconsin-Madison July 6, 2014) Abstract Motivated by a range of applications

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Numerical Solutions to Partial Differential Equations

Numerical Solutions to Partial Differential Equations Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical Sciences Peking University The Residual and Error of Finite Element Solutions Mixed BVP of Poisson Equation

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Cones of measures. Tatiana Toro. University of Washington. Quantitative and Computational Aspects of Metric Geometry

Cones of measures. Tatiana Toro. University of Washington. Quantitative and Computational Aspects of Metric Geometry Cones of measures Tatiana Toro University of Washington Quantitative and Computational Aspects of Metric Geometry Based on joint work with C. Kenig and D. Preiss Tatiana Toro (University of Washington)

More information

Letting p shows that {B t } t 0. Definition 0.5. For λ R let δ λ : A (V ) A (V ) be defined by. 1 = g (symmetric), and. 3. g

Letting p shows that {B t } t 0. Definition 0.5. For λ R let δ λ : A (V ) A (V ) be defined by. 1 = g (symmetric), and. 3. g 4 Contents.1 Lie group p variation results Suppose G, d) is a group equipped with a left invariant metric, i.e. Let a := d e, a), then d ca, cb) = d a, b) for all a, b, c G. d a, b) = d e, a 1 b ) = a

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Approximating the Integrated Tail Distribution

Approximating the Integrated Tail Distribution Approximating the Integrated Tail Distribution Ants Kaasik & Kalev Pärna University of Tartu VIII Tartu Conference on Multivariate Statistics 26th of June, 2007 Motivating example Let W be a steady-state

More information

Small ball probabilities and metric entropy

Small ball probabilities and metric entropy Small ball probabilities and metric entropy Frank Aurzada, TU Berlin Sydney, February 2012 MCQMC Outline 1 Small ball probabilities vs. metric entropy 2 Connection to other questions 3 Recent results for

More information

A note on some approximation theorems in measure theory

A note on some approximation theorems in measure theory A note on some approximation theorems in measure theory S. Kesavan and M. T. Nair Department of Mathematics, Indian Institute of Technology, Madras, Chennai - 600 06 email: kesh@iitm.ac.in and mtnair@iitm.ac.in

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Valuations. 6.1 Definitions. Chapter 6

Valuations. 6.1 Definitions. Chapter 6 Chapter 6 Valuations In this chapter, we generalize the notion of absolute value. In particular, we will show how the p-adic absolute value defined in the previous chapter for Q can be extended to hold

More information

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC

More information

MATH Final Project Mean Curvature Flows

MATH Final Project Mean Curvature Flows MATH 581 - Final Project Mean Curvature Flows Olivier Mercier April 30, 2012 1 Introduction The mean curvature flow is part of the bigger family of geometric flows, which are flows on a manifold associated

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

Quasi-conformal maps and Beltrami equation

Quasi-conformal maps and Beltrami equation Chapter 7 Quasi-conformal maps and Beltrami equation 7. Linear distortion Assume that f(x + iy) =u(x + iy)+iv(x + iy) be a (real) linear map from C C that is orientation preserving. Let z = x + iy and

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Rigidity and Non-rigidity Results on the Sphere

Rigidity and Non-rigidity Results on the Sphere Rigidity and Non-rigidity Results on the Sphere Fengbo Hang Xiaodong Wang Department of Mathematics Michigan State University Oct., 00 1 Introduction It is a simple consequence of the maximum principle

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Mixing time for a random walk on a ring

Mixing time for a random walk on a ring Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013 Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Some functional (Hölderian) limit theorems and their applications (II)

Some functional (Hölderian) limit theorems and their applications (II) Some functional (Hölderian) limit theorems and their applications (II) Alfredas Račkauskas Vilnius University Outils Statistiques et Probabilistes pour la Finance Université de Rouen June 1 5, Rouen (Rouen

More information

A Note on the Central Limit Theorem for a Class of Linear Systems 1

A Note on the Central Limit Theorem for a Class of Linear Systems 1 A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 2 November 6 November Deadline to hand in the homeworks: your exercise class on week 9 November 13 November Exercises (1) Let X be the following space of piecewise

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

be the set of complex valued 2π-periodic functions f on R such that

be the set of complex valued 2π-periodic functions f on R such that . Fourier series. Definition.. Given a real number P, we say a complex valued function f on R is P -periodic if f(x + P ) f(x) for all x R. We let be the set of complex valued -periodic functions f on

More information

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech Math 6455 Nov 1, 26 1 Differential Geometry I Fall 26, Georgia Tech Lecture Notes 14 Connections Suppose that we have a vector field X on a Riemannian manifold M. How can we measure how much X is changing

More information