MLISP: Machine Learning in Signal Processing Spring Lecture 8-9 May 4-7

Similar documents
MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11

Lecture 2: Haar Multiresolution analysis

Introduction to Discrete-Time Wavelet Transform

1. Fourier Transform (Continuous time) A finite energy signal is a signal f(t) for which. f(t) 2 dt < Scalar product: f(t)g(t)dt

Assignment #09 - Solution Manual

Lecture 3: Haar MRA (Multi Resolution Analysis)

ECE 901 Lecture 16: Wavelet Approximation Theory

Wavelets and Multiresolution Processing

Signal Analysis. Filter Banks and. One application for filter banks is to decompose the input signal into different bands or channels

Course and Wavelets and Filter Banks

Ch 6.4: Differential Equations with Discontinuous Forcing Functions

Lecture 16: Multiresolution Image Analysis

1 Introduction to Wavelet Analysis

A Tutorial on Wavelets and their Applications. Martin J. Mohlenkamp

Digital Image Processing

Digital Image Processing

Wavelets in Scattering Calculations

Wavelets: Theory and Applications. Somdatt Sharma

Haar wavelets. Set. 1 0 t < 1 0 otherwise. It is clear that {φ 0 (t n), n Z} is an orthobasis for V 0.

Introduction to Signal Spaces

Piecewise constant approximation and the Haar Wavelet

Lecture 15: Time and Frequency Joint Perspective

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Introduction to Hilbert Space Frames

Chapter 7 Wavelets and Multiresolution Processing

Multiresolution Analysis

Ch. 15 Wavelet-Based Compression

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur

Wavelets and multiresolution representations. Time meets frequency

An Introduction to Wavelets and some Applications

MGA Tutorial, September 08, 2004 Construction of Wavelets. Jan-Olov Strömberg

Data Mining Stat 588

(Refer Slide Time: 0:18)

Boundary functions for wavelets and their properties

Two Channel Subband Coding

Lectures notes. Rheology and Fluid Dynamics

Lecture Notes 5: Multiresolution Analysis

Digital Image Processing Lectures 15 & 16

Sparse linear models

Math Real Analysis II

Development and Applications of Wavelets in Signal Processing

Multiresolution image processing

Biorthogonal Spline Type Wavelets

4.1 Haar Wavelets. Haar Wavelet. The Haar Scaling Function

We have to prove now that (3.38) defines an orthonormal wavelet. It belongs to W 0 by Lemma and (3.55) with j = 1. We can write any f W 1 as

Vectors in Function Spaces

From Fourier to Wavelets in 60 Slides

ACM 126a Solutions for Homework Set 4

Invariant Scattering Convolution Networks

0.1 Tangent Spaces and Lagrange Multipliers

Page 684. Lecture 40: Coordinate Transformations: Time Transformations Date Revised: 2009/02/02 Date Given: 2009/02/02

Wavelets. Introduction and Applications for Economic Time Series. Dag Björnberg. U.U.D.M. Project Report 2017:20

Digital Image Processing

Lecture 27. Wavelets and multiresolution analysis (cont d) Analysis and synthesis algorithms for wavelet expansions

Analytical and Numerical Aspects of Wavelets

Lecture 7 Multiresolution Analysis

2D Wavelets. Hints on advanced Concepts

Reproducing Kernel Hilbert Spaces

V. SUBSPACES AND ORTHOGONAL PROJECTION

MA 201: Differentiation and Integration of Fourier Series Applications of Fourier Series Lecture - 10

MATH 5640: Fourier Series

Harmonic Analysis: from Fourier to Haar. María Cristina Pereyra Lesley A. Ward

Denoising and Compression Using Wavelets

Math 2025, Quiz #2. Name: 1) Find the average value of the numbers 1, 3, 1, 2. Answere:

Wavelet Bases of the Interval: A New Approach

Defining the Discrete Wavelet Transform (DWT)

On the Hilbert Transform of Wavelets

1 The Continuous Wavelet Transform The continuous wavelet transform (CWT) Discretisation of the CWT... 2

MULTIRATE DIGITAL SIGNAL PROCESSING

Introduction to Wavelet. Based on A. Mukherjee s lecture notes

Signal Analysis. Multi resolution Analysis (II)

Space-Frequency Atoms

GRAPHICAL VERIFICATION OF A BASIS USING MATLAB

INTRODUCTION TO. Adapted from CS474/674 Prof. George Bebis Department of Computer Science & Engineering University of Nevada (UNR)

Math Advanced Calculus II

Lecture Notes for Math 251: ODE and PDE. Lecture 7: 2.4 Differences Between Linear and Nonlinear Equations

Jim Lambers ENERGY 281 Spring Quarter Homework Assignment 3 Solution. 2 Ei r2 D., 4t D and the late time approximation to this solution,

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

How much should we rely on Besov spaces as a framework for the mathematical study of images?

Space-Frequency Atoms

Computational Methods CMSC/AMSC/MAPL 460. Fourier transform

Matrix-Valued Wavelets. Ahmet Alturk. A creative component submitted to the graduate faculty

Introduction to Orthogonal Transforms. with Applications in Data Processing and Analysis

Wavelets Marialuce Graziadei

Wavelets and Image Compression. Bradley J. Lucier

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Mathematical Methods for Computer Science

be the set of complex valued 2π-periodic functions f on R such that

Communication Signals (Haykin Sec. 2.4 and Ziemer Sec Sec. 2.4) KECE321 Communication Systems I

Representation: Fractional Splines, Wavelets and related Basis Function Expansions. Felix Herrmann and Jonathan Kane, ERL-MIT

Wavelets and Multiresolution Processing. Thinh Nguyen

Transform methods. and its inverse can be used to analyze certain time-dependent PDEs. f(x) sin(sxπ/(n + 1))

Digital Image Processing Lectures 13 & 14

Frames. Hongkai Xiong 熊红凯 Department of Electronic Engineering Shanghai Jiao Tong University

Chapter 7 Wavelets and Multiresolution Processing. Subband coding Quadrature mirror filtering Pyramid image processing

AN INTRODUCTION TO COMPRESSIVE SENSING

RIESZ BASES AND UNCONDITIONAL BASES

Wavelet analysis on financial time series. By Arlington Fonseca Lemus. Tutor Hugo Eduardo Ramirez Jaime

MTH 309Y 37. Inner product spaces. = a 1 b 1 + a 2 b a n b n

Transcription:

MLISP: Machine Learning in Signal Processing Spring 2018 Prof. Veniamin Morgenshtern Lecture 8-9 May 4-7 Scribe: Mohamed Solomon Agenda 1. Wavelets: beyond smoothness 2. A problem with Fourier transform 3. Short time Fourier transform 4. Wavelets: the basic idea 5. Haar wavelets 1 Wavelets: beyond smoothness Recall the example of phoneme classification from the previous lecture. In the example of audio classification, we have seen how we can improve performance by using splines to smooth the data and regularize the linear classifier. Concretely, our approach can be summarized as follows: For a signal x R 256 we would normally need 256 basis functions to represent x exactly. Instead, we only used D = 12 basis functions h 1 ( ),..., h 12 ( ) and the functions are designed in such a way that every function in the span of h 1 ( ),..., h 12 ( ) is smooth. The processed data is: 256 x m = [H T x] m = h m (f j )x j = h T mx. j=1 Let us decompose x = x smooth + x non smooth, where x smooth is the projection of x onto the span of h 1 ( ),..., h 12 ( ), and x non smooth is the projection of x onto the orthogonal complement of the span of h 1 ( ),..., h 12 ( ). Then, h T mx = h T m(x smooth + x non smooth ) = h T mx smooth + h T mx non smooth }{{} 0 = h T mx smooth. We see that the non-smooth part of the data does not affect the coefficients x m, and hence does not affect learning. This is filtering: the non-smooth part of signal happens to be irrelevant for separation of ao and aa phonemes. Hence, removing that part improves the outcome. Often, our true signal is far from smooth: 1

Discontinuities are important part of natural phenomena. For example, in images edges carry an important information: Wavelets are the tools that allow us to capture structures beyond smoothness and still separates out the irrelevant part. 2 A problem with Fourier transform Fourier transform reveals important information about a signal: its frequency content. If the function f (t) is periodic on the interval [ π, π], it can be written as the Fourier series: f (t) = + X cn φn (t) n= where Z π cn = hφn, f i = φn (t)f (t)dt π and the basis functions are 1 φn (t) = eint. 2π Importantly, the Fourier basis function form an orthonormal basis: hφn, φn i = 1 and hφn, φk i = 0 if n 6= k. This property makes all computations and algorithms very simple. If x(t) represents a piece of music, then the Fourier coefficients {cn } reveal which tones form that piece of music. The problem is that Fourier transform is a global transformation, a local change in the signal affects the Fourier transform everywhere. Concretely, consider a function with one discontinuity. The first 5 terms of Fourier series approximate the function as follows: 2

The first 20 terms of Fourier series approximate it as follows: The first 40 terms of Fourier series approximate it as follows: We observe a jump of about 9% near the singularity, no matter how many terms we take. In general, singularities (discontinuities in f(x) or in its derivatives) cause high frequency coefficients (c n with large n) in the Fourier series f(t) = 1 π + n= c n e int 3

to be large. Note that the singularity is only in one point, but it causes all c n (for large n) to be large. This is bad for two reasons: (1) If there is even one singularity in the signal, then many Fourier coefficients will be large. Hence, we cannot use Fourier coefficients for efficient signal compression. (2) The Fourier series tells us nothing about the time location of an interesting features in the signal. For example, consider two signals: For both signals c 100 is large, but the information on where in time does the frequency burst happen is not immediately available. 3 Short time Fourier transform One possible solution for the frequency localisation problem is to use the window function like this: and calculate that Fourier coefficients for the windowed function f(t) g(t kt 0 ). Specifically c nk = 1 2π π π f(t)g(t kt 0 )e int dt. This is the short-time Furier transform. It decomposes the 1D signal into the 2D time-frequency representation (k indexes time, n indexes frequency) like this: 4

This representation is very similar to the way people write the sheet music. Such representation is often useful. We will return to it when we discuss the Shazam Algorithm. Note however that the functions g nk (t) = g(t kt 0 )e int are not orthogonal, unlike φ n (t) = 1 2π e int in Fourier series. Therefore these functions do not form a nice basis. We need something better. 4 Wavelets: the basic idea The basic idea is to start with an appropriate basic function h(t) (the father wavelet), which might look like this: Then form all possible translations by integers and all possible stretchings by powers of 2 of the father wavelet: h nk (t) = 2 n 2 h(2 n t k). Above, 2 n 2 is just a normalization constant. To illustrate this construction, below we display h(2t) and h(4x 3): 5

Define the expansion coefficients as: c nk = dtf(t)h nk (t). It turns out that if h is chosen properly, then h nk (t) are orthogonal: h nk, h n k = h nk (t)h n k (t)dt = 0 unless n = n and k = k. Furthermore, h nk form a complete set so that every function can be expected as: f(t) = n,k c nk h nk (t). Indeed, by completeness: f(t) = n,k a nk h nk (t) for some a nk. To find the coefficients, use the orthonormality of h nk (t): f, h n k = n,k a nk h nk, h n k = a n k. This is just like in the case of Fourier series. There are major advantages compared to Fourier series. For high frequencies (n large), the functions h nk (t) have good localization (they get thinner as n ). The location of high frequency components can be seen from wavelet analysis, but not from Fourier seriers. Next we consider the simples possible wavelet construction Haar wavelets. 6

5 Haar wavelets 5.1 The pixel approximation spaces Suppose we have a pixel function φ(t) = { 1, if 0 x 1 0, otherwise We wish to build all other functions out of φ(t) and all translates φ(t k): Note that any function that is constant on integers can be written as a linear combination of φ(t k): f(t) = 2φ(t) + 3φ(t 1) + 2φ(t 2) + 4φ(t 3) (1) Given a function f(t) (that is not constant on integers) we can approximate it by linear combinations of φ(t k): 7

Let V 0 denote the set of all squared integrable functions that are constant on integer intervals. Every element of g V 0 can be written as a linear combination of integer translates of the pixel φ: g(t) = k a k φ(t k). (2) The elements of V 0 look like this: To get better approximation shrink the basic function: Figure 1: φ(t), φ(2t), φ(2 2 t) In details, φ(2t) looks like this: 8

and φ(2t 1) looks like this To see this note: 2t = 1 iff t = 1 2 and 2t 1 = 1 iff t = 1. The smaller functions φ(2t + k) give better approximation to f(t): Let V 1 = denote the set of all square integrable functions that are constant on all half-integers. Every element of g V 1 can be written as a linear combination of integer translates of the pixel φ(2t): g(t) = a k φ(2t k). (3) k The elements of V 1 look like this: 9

We can continue in the analogous fashion. The elements of V 2 look like this: In general, let V j denote the set of all square integrable functions that are constant on 2 j -length intervals. Every element of g V j can be written as a linear combination of integer translates of the pixel φ(2 j t): g(t) = a k φ(2 j t k). k 5.2 The general theory The pixel function φ( ) generates the approximation space we need. However its scaled versions and translates are not orthonormal to each other. We fix this problem next. Define the father function: 1, if 0 t 1 2 ψ(t) = 1 1, if 2 t 1 0, otherwise. This function looks like this: 10

It is important not to confuse ψ( ) and φ( )! From the father function we generate the family of Haar wavelets by integer translation, ψ 0,5 = ψ(t 5): and scaling, ψ 3,7 = 2 3 2 ψ(2 3 t 7): 11

In general ψ jk = 2 j 2 ψ(2 j t k). It is not difficult to see that Haar wavelets are orthogonal across scales and within the same scale: + ψjk, ψ j k = dtψ jk (t)ψ j k (t) = 0 if j j or k k. Indeed, If j = j and k k, then ψ jk, ψ j k = 0 because ψjk (t) = 0 for those t for which ψ jk (t) 0 and vice versa. If j j then ψ jk, ψ j k = ψjk (t)ψ j k (t)dt = 0, as can be seen from the figure: Can every function be represented as combination of Haar wavelets? Recall that V j is the space of square integrable function constant on dyadic intervals of length 2 j. The elements of this space can be written as k a kφ(2 j t k). If j is negative, the intervals are of length greater than 1. Concretely, V 1 contains functions constant on intervals of length 2; V 2 contains functions constant on intervals of length 4, etc. We will next list the properties of {V j } : 1. If a function is piecewise constant on integers then it is piecewise constant on half integers. Therefore,...V 2 V 1 V 0 V 1 V 2 V 3... 2. It can be shown that n V n = {0} (the constant zero function). 12

3. Every function can be approximated by a staircase function arbitrarily well if the side of the stairs is small enough. Therefore, n V n is dense in the space of square integrable functions. 4. Take a function that is constant on interval of length 2 n. Shrink it by a factor of 2. The result is constant on interval of length 2 n 1. Therefore, if f(t) V n then f(2t) V n+1. 5. Translating a function by an integer does not change the fact that it is constant on integer intervals. Therefore, if f(x) V 0 then f(t k) V 0. 6. The family of function: φ 0k = φ(x k) form an orthogonal basis for V 0. The function φ is called scaling function. Definition 1. A sequence of spaces {V j } together with scaling function φ that generates V 0 so that (1)-(6) are satisfied is called the multiresolution analysis. Definition 2. Assume M 1 and M 2 are orthogonal subspaces, i.e. w 1 w 2 for all w 1 M 1 and w 2 M 2. The subspace V is the orthogonal direct sum of M 1 and M 2, denoted as V = M 1 M 2, if every v V can be written uniquely as with w 1 M 1 and w 2 M 2. v = w 1 + w 2 Now consider again our hierarchy of subspaces...v 2 V 1 V 0 V 1 V 2 V 3... Since V 0 V 1, there iss a subspace W 0 such that V 0 W 0 = V 1, we done this subspace W 0 = V 1 V 2. Similarly define W 1 = V 2 V 1 and, in general, W j 1 = V j V j 1. We have: V 3 = V 2 W 2 = V 1 W 1 W 2 = V 0 W 0 W 1 W 2 = V 1 W 1 W 0 W 1 W 2 =... and therefore for v 3 V 3 : v 3 = v 2 + w 2 = v 1 + w 1 + w 2 = v 0 + w 0 + w 1 + w 2 = v 1 + w 1 + w 0 + w 1 + w 2 =... with v i V i and w i W i. In conclusion, the relationship between {V j } and {W j } looks like this: 13

How can we characterize the W j spaces? Claim 3. W 0 is the set of functions that are constant on half integers and take equal and opposite values on half of each integer interval. For example, here is a element from W 0 : Proof. Let A be the the set of functions that are constant on half integers and take equal and opposite values on half of each integer interval. We need to show that W 0 = A. First, show that V 0 A = V 1. and that V 0 and A are orthogonal. Then it will follow that: A = V 1 V 0 W 0. (4) First, take f V 0 and g A, these functions look like this: Thus it can be directly verified that f, g = + f(t)g(t)dt = 0 because f(t)g(t) takes on 14

equal and opposite values on each half of every integer interval, and so integers to 0 on each integer interval. Second, for every f V 1 there exist f 0 V 0 and g 0 A such that f = f 0 + g 0. Indeed, take f V 1. It is constant on the half integer intervals: Define f 0 to be the function that is constant on each integer interval and whose values is the average of two values of f on that interval: Then, f 0 is constant on integer intervals, and so f 0 V 0. Now define g 0 (t) = f(t) f 0 (t). Clearly g 0 takes on equal and opposite values on each half of every integer interval, and so g 0 (x) A. To summarise we have f(x) = f 0 (x) + g 0 (x) where f 0 V 0 and g 0 A. Therefore, V 1 = V 0 A A = W 0. Similarly we can show that W j is the space of square integrable functions that take on equal and opposite values on each half of the dyadic interval of length 2 j. To summarise we designed the following structure: 15

Every function with 1 16 quantization: can be decomposed into 8 functions in W3, plus 8 = 4 + 2 + 1 + 1 in lower layers. Typically the representation will be sparse. For example, consider the function we started with: The representation is sparse because most of the coefficients are zero at high level of details. Concretely, consider ψ3,i W3 Note hf, ψ3,i i = 0 for all i 6= 3, 4. Only hf, ψ3,i i = 6 0. Therefore, one singularity in the function only affects a few selected coefficients, unlike in the Fourier transform case. 16