Introduction to Orthogonal Transforms. with Applications in Data Processing and Analysis

Size: px

Start display at page:

Download "Introduction to Orthogonal Transforms. with Applications in Data Processing and Analysis"

Candice Stephens
5 years ago
Views:

1 i Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis

2 ii

3 Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis Ruye Wang March 28, 211 i

4 ii

5 iii cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 2RU, UK Published in the United States of America by Cambridge University Press, New York Information on this title: C Cambridge university Press 211 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 211 Printed in the United Kingdom at the University Press, Cambridge AcataloguerecordforthispublicationisavailablefromtheBritishLibrary Library of Congress Cataloguing in Publication data ISBN-13 ISBN XXXXX-X hardback -521-XXXXX-X hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

6 iv Contents To my parents

7 Contents Preface Notation page vii xv 1 Signals and Systems Continuous and Discrete Signals Unit Step and Nascent Delta Functions Relationship Between Complex Exponentials and Delta Functions Attributes of Signals Signal Arithmetics and Transformations Linear and Time Invariant Systems Signals Through LTI Systems (Continuous) Signals Through LTI Systems (Discrete) Continuous and Discrete Convolutions Homework Problems 27 2 Vector Spaces and Signal Representation Inner Product Space Vector Space Inner Product Space Bases of Vector Space Signal Representation by Orthogonal Bases Signal Representation by Standard Bases An Example: the Fourier Transforms Unitary Transformation and Signal Representation Linear Transformation Eigenvalue problems Eigenvectors of D 2 as Fourier Basis Unitary Transformations Unitary Transformations in N-D Space Projection Theorem and Signal Approximation Projection Theorem and Pseudo-Inverse Signal Approximation Frames and Biorthogonal Bases 78 v

8 vi Contents Frames Signal Expansion by Frames and Riesz Bases Frames in Finite-Dimensional Space Kernel Function and Mercer s Theorem Summary Homework Problems 98 3 Continuous-Time Fourier Transform The Fourier Series Expansion of Periodic Signals Formulation of The Fourier Expansion Physical Interpretation Properties of The Fourier Series Expansion The Fourier Expansion of Typical Functions The Fourier Transform of Non-Periodic Signals Formulation Relation to The Fourier Expansion Properties of The Fourier Transform Fourier Spectra of Typical Functions The Uncertainty Principle Homework Problems Bibliography 14

9 Preface What Is the Book All about? When a straight line standing on a straight line makes the adjacent angles equal to one another, each of the equal angles is right, and the straight line standing on the other is called a perpendicular to that on which it stands. Euclid,Elements, Book 1, definition 1 This is Euclid s definition for perpendicular, which is synonymous to the word orthogonal used in the title of this book. Although the meaning of this word has been generalized since Euclid s time to describe the relationship between two functions as well as two vectors, as what we will be mostly concerned with in this book, they are essentially no different from two perpendicular straight lines, as discussed by Euclid some 23 centuries ago. Orthogonality is of important significance not only in geometry and mathematics, but also in science and engineering in general, and in data processing and analysis in particular. This book is about a set of mathematical and computational methods, known collectively as the orthogonal transforms, that enables us to take advantage of the orthogonal axes of the space in which the data reside. As we will see through out the book, such orthogonality is a much desired property that can keep things untangled and nicely separated for ease of manipulation, and an orthogonal transform can rotate a signal, represented as a vector in a Euclidean, or more generally, Hilbert space, in such a way that the signal components tend to become, approximately or accurately, orthogonal to each other. Such orthogonal transforms, typified by the most well known Fourier transform, lend themselves well to various data processing and analysis needs, and are therefore used in a wide variety of disciplines and areas including both social and natural sciences as well as engineering. The book also covers the Laplace and Z transforms, which can be considered as the extended versions of the Fourier transform for continuous and discrete functions, respectively, and the wavelet transforms that may not be strictly orthogonal but are still closely related to those that are. In the last few decades the scales of data collection across almost all fields have been increasingdrastically due mostly to the rapid advances in technologies. Consequently how to best make sense of the fast accumulating data has become more challenging than ever. Wherever a large amount of data is collected, from stock market indices in economy to microarray data in bioinformatics, from seismic vii

10 viii Preface data in geophysics to audio and video data in communication and broadcasting engineering, there is always the need to process, analyze and compress the data in some meaningful way for the purpose of effective and efficient data transmission, interpretation and storage, by various computational methods and algorithms. The transform methods discussed in this book can be used as a set of basic tools for the data processing and the subsequent analysis such as data mining, knowledge discovery, and machine learning. The specific purpose of each data processing and analysis task at hand may vary from case to case. From a set of given data, one may desire to remove certain type of noise, extract a particular kind of features of interest, and/or reduce the quantity of the data without losing useful information for storage and transmission. On the other hand, many operations needed for achieving these very different goals may all be carried out using the same mathematical tool of orthogonal transform, by which the data is manipulated and represented in such a way that the desired results can be achieved effectively in the subsequent stage. To address all such needs, this book presents a thorough introduction to the mathematical background common to these transform methods, and provides arepertoireofcomputationalalgorithmsforthesemethods. The basic approach of the book is the combination of the theoretical derivation and practical implementation of each of transform method considered. Certainly many existing books touch upon the topics of both orthogonal and wavelet transforms, from either mathematical or engineering point of view. Some of them may concentrate on the theories of these methods, while others may emphasize their applications, but relatively few would guide the reader directly from the mathematical theories to the computational algorithms, and then to their applications to real data analysis, as what this book intends to do. Here deliberate efforts are made to bridge the gap between the theoretical background and the practical implementation, based on the belief that to truly understand a certain method, one needs ultimately to be able to convert the mathematical theory into computer code for the algorithms to be actually implemented and tested. This idea has been the guiding principle through out the writing of the book. For each of the methods covered, we will first derive the theory mathematically, then present the corresponding computational algorithm, and finally provide the necessary code segments in Matlab or C for the key parts of the algorithm. Moreover, we will also include some relatively simple application examples to illustrate the actual data processing effects of the algorithm. In fact every one of the transform methods considered in the book has been implemented by either Matlab or Cprogramminglanguageandtestedonrealdata.Thecompleteprogramsare also made readily available in the CD attached to the book, as well as a website dedicated to the book at: The reader is encouraged and expected to try these algorithms out by running the code on his/her own data. Why Orthogonal Transforms?

11 Preface ix The transform methods covered in the book are a collection of both old and new ideas ranging from the classical Fourier series expansion that goes back almost 2 years, to some relatively recent thoughtssuchasthevariousorigins of the wavelet transform. While all of these ideas were originally developed with different goals and applications in mind, from solving the heat equation to the analysis of seismic data, they can all be considered to belong to the same family, based on the common mathematical frame work they all share, and their similar applications in data processing and analysis. The discussions of specific methods and algorithms in the chapters will all be approached from such a unified point of view. Before the specific discussion of each of the methods, let us first address a fundamental issue: why do we need to carry out an orthogonal transform to start with? As the measurement of a certain variable, e.g., the temperature, of a physical process, a signal tends to vary continuously and smoothly, as the energy associated with the physical process is most probably distributed relatively evenly in both space and time. Most such spatial or temporal signals are likely to be correlated, in the sense that given the value of a signal at a certain point in space or time, one can predict with reasonable confidence that the signal at a neighboring point will take a similar value. Such everyday experience is due to the fundamental nature of the physical world governed by the principles of minimum energy and maximum entropy, in which any abruption and discontinuities, typically caused by energy surge of some kind, are relatively rare and unlikely events (except in the microscopic world governed by quantum mechanics). On the other hand, from the signal processing view point, the high signal correlation and even energy distribution are not desirable in general, as it is difficult to decompose such a signal, as needed in various applications such as information extraction, noise reduction and data compression. The issue therefore becomes, how can the signal be converted in such a way that it is less correlated and its energy less evenly distributed, and to what extent such a conversion can be carried out to achieve the goal. Specifically, in order to represent, process and analyze a signal, it needs to be decomposed into a set of components along a certain dimension. While typically a signal is represented by default as a continuous or discrete function of time or space, it may be desirable to represent it along some alternative dimension, most commonly frequency, so that it can be processed and analyzed more effectively and conveniently. Viewed mathematically, a signal is a vector in some vector space which can be represented by any of a set of different orthogonal bases all spanning the same space. Each representation corresponds to a different decomposition of the signal. Moreover, all such representations are equivalent in the sense that they are related to each other by certain rotation in the space by which the total energy or information contained in the signal is conserved. From this point of view, all different orthogonal transform methods developed in the last two hundred years by mathematicians, scientists and engineers for

12 x Preface various purposes can be unified to form a family of methods for the same general purpose. While all transform methods are equivalent as they all conserve the total energy or information of the the signal, they can be very different in terms of how the total energy or information in the signal is redistributed among its components after the transform, and how much these components are correlated. If, after a properly chosen orthogonal transform, the signal is represented in such awaythatitscomponentsaredecorrelatedandmostofthesignalinformationof interest is concentrated in a small subset of its components, then the remaining components could be neglected as they carry little information. This simple idea is essentially the answer to the question asked above: why an orthogonal transform is needed, and it is actually the foundation of the generalorthogonal transform method for feature selection, data compression, and noise reduction. In a certain sense, once a proper basis of the space is chosen so that the signal is represented in such a favorable manner, the signal-processing goal is already achieved to a significant extent. What Is in the Chapters? The purpose of the first two chapters is to establish a solid mathematical foundation for the thorough understanding of the topics of the subsequent chapters each discussing a specific type of transform method. Chapter 1 is a brief summary of the basic concepts of signals and linear time-invariant (LTI) systems. For readers with an engineering background, much of this chapter may be a quick review that could be scanned through or even skipped. For others this chapter serves as an introduction to the mathematical language by which the signals and systems will be described in the following chapters. Chapter 2 sets up the stage for all transform methods by introducing the key concepts of the vector space, or more strictly speaking, Hilbert space, and the linear transformations in such a space. Here a usual N-dimensional space can be generalized in several aspects: (1) the dimension N of the space may be extended to infinity, (2) a vector space may also include a function space composed of all continuous functions satisfying certain conditions, and (3) the basis vectors of a space may become uncountable. The mathematics needed for a rigorous treatment of these much-generalized spaces is likely to be beyond the comfort zone of most readers with typical engineering or science background, and it is therefore also beyond the scope of this book. The emphasis of the discussion here is not mathematical rigor, but the basic understanding and realization that many of the properties of these generalized spaces are just the natural extensions of those of the familiar N-D vector space. The purpose of such discussions is to establish a common foundation for all transform methods so that they can all be studied from a unified point of view, namely, any given signal, either continuous or discrete, with either finite or infinite duration, can be treated as a vector in a certain space and represented differently by any of a variety of orthogonal transform methods, each corresponding to one of the orthogonal bases that span the space. Moreover, all of these different representationsarerelatedtoeach

13 Preface xi other by rotations in the space. Such basic ideas may also be extended to nonorthogonal (e.g., biorthogonal) bases that are used in wavelet transforms. All transform methods considered in later chapters will be studied in light of such aframework.althoughitishighlyrecommendedforthereadertoatleast read through the materials in the first two chapters, those who feel difficult to thoroughly follow the discussions could skip them and move on to the following chapters, as many of the topics could be studied relatively independently, and one can always come back to learn some of the concepts in the first two chapters when needed. In Chapters 3 and 4, we study the classical Fourier methods for continuous and discrete signals respectively. Fourier s theory is mathematically beautiful and is referred to as mathematical poem, and it has great significance through out awidevarietyofdisciplinesinpracticeaswellasintheory.whilethegeneral topic of the Fourier transform is covered in a large number of textbooks in various fields such as engineering, physics, and mathematics, here a not-so-conventional approach is adopted to treat all Fourier related methods from a unified point of view. Specifically, the Fourier series (FS) expansion,thecontinuousanddiscretetime Fourier transforms (CTFT and DTFT), and the discrete Fourier transform (DFT), will be considered as four different variations of the same general Fourier transform, corresponding to the four combinations of the two basic categories of signals: continuous versus discrete, periodic versus non-periodic. By doing so, many of the dual and symmetrical relationships among these four different forms and between time and frequency domains ofthefouriertransformcanbemuch more clearly and conveniently presented and understood. Chapter 5 discusses the Laplace and Z transforms. Strictly speaking, these transforms do not belong to the family of orthogonal transforms, which convert a 1-dimensional signal of time t into another 1-dimensional function along a different variable, typically, frequency f or angular frequency ω = 2πf. Instead, the Laplace converts a 1-D continuous signal from time domain into a function in a 2-D complex plane s = σ + jω,andthez-transformsconvertsa1-ddiscrete signal from time domain into a function in a 2-D complex plane z = e s.however, as these transforms are respectively the natural extensions of the continuous and discrete-time Fourier transforms, and are widely used in signal processing and system analysis, they are included in the book as two extra tools in our toolbox. Chapter 6 discusses the Hartly and sine/cosine transforms, both of which are closely related to the Fourier transform. As real transforms, both Hartly and sine/cosine transforms have the advantage of reduced computational cost when compared with the complex Fourier transform. If the signal in question is real with zero imaginary part, then half of the computation in its Fourier transform is redundant and therefore wasted. However, this redundancy is avoided by a real transform such as the cosine transform, which is widely used for data compression, such as in the image compression standard JPEG. Chapter 7 combines three transform methods, the Walsh-Hadamard, slant, and Haar transforms, all sharing some similar characteristics, i.e., the basis functions

14 xii Preface associated with these transforms all have square-wave like waveforms. Moreover, as the Haar transform also possesses the basic characteristics of the wavelet transform method, it can also serve as abridgebetweenthetwocampsofthe orthogonal transforms and the wavelet transforms, leading a natural transition from the former to the latter. In Chapter 8 we discuss the Karhunen-Loeve transform (KLT), which can be considered as a capstone of all previously discussed transform methods, and the associated principal component analysis (PCA), which is popularly used in many data processing applications. The KLT is the optimal transform method among all orthogonal transforms in terms of the two main characteristics of the general orthogonal transform method, namely, the compaction of signal energy and the decorrelation among all signal components. In this regard, all orthogonal transform methods can be compared against the optimal KLT for an assessment of their performances. We next consider in Chapter 9 both the continuous and discrete-time wavelet transforms (CTWT and DTWT), which differ from all orthogonal transforms discussed previously in two main aspects. First, the wavelet transforms are not strictly orthogonal as the bases used to span the vector space and to represent agivensignalmaynotbenecessarilyorthogonal.second,thewavelettransform converts a 1-D time signal into a 2-D function of two variables, one for different levels of details or scales, corresponding to different frequencies in the Fourier transform, while the other for different temporal positions, which is totally absent in the Fourier or any other orthogonal transform. While redundancy is inevitably introduced into the 2-D transform domain by such a wavelet transform, the additional second dimension also enables the transform to achieve both temporal and frequency localities in signal representation at the same time (while all other transform methods can only achieve either one of the two localities). Such a capability of the wavelet transform is its main advantage over orthogonal transforms in some applications such as signal filtering. Finally in Chapter 1, we introduce the basic concept of multiresolution analysis (MRA), and Mallat s fast algorithm for the discrete wavelet transform (DWT) together with its filter bank implementation. Similar to the orthogonal transforms, this algorithm converts a discrete signal of size N into a set of DWT coefficients also of size N, from which the original signal can be perfectlyreconstructed, i.e., there is no redundancy introduced by the DWT. However, different from orthogonal transforms, the DWT coefficients represent the signal with temporal as well as frequency (levels of details) localities, and can therefore be more advantageous in some applications such as data compressions. Moreover, some fundamental results in linear algebra and statistics are also summarized in the two appendices in the back of the book. Who Are the Intended Readers? The book can be used as a textbook for either an undergraduate or graduate course in digital signal processing, communication, or other related areas. In such a classroom setting, all orthogonal transform methods can be systemati-

15 Preface xiii cally studied following a thorough introduction of the mathematical background common to these methods. The mathematics prerequisite is no more than basic calculus and linear algebra. Moreover, the book can also be used as a reference by practicing professionals in both natural and social sciences, as well as engineering. A financial analyst or a biologist may need to learn how to effectively analyze and interpret his/her data, a database designer may need to know how to compress his data before storing them in the database, and a software engineer may need to learn the basic data processing algorithms while developing asoftwaretoolinthefield.ingeneral,anyonewhodealswithalargequantity of data may desire to gain some basic knowledge in data processing, regardless of his/her backgrounds and specialties. In fact the book has been developed with such potential readers in mind. Due possibly to the personal experience, the author always feels that self-learning (or to borrow a machine learning terminology, unsupervised learning ) is no less important than formal classroom learning. One may have been out of school for some years but till feel the need to update and expand his/her knowledge. Such readers could certainly study whichever chapters of interest, instead of systematically reading through each chapter from beginning to end. They can also skip certain mathematical derivations, which are included in the book for completeness (and for those who feel comfortable only if the complete proof and derivations of all conclusions are provided). For some readers, neglecting much of the mathematical discussion for aspecifictransformmethodshouldbejustfineifthebasicideasregardingthe method and its implementation are understood. It is hoped that the book can serve as a toolbox, as well as a textbook, from which certain transform methods of interest can be learned and applied, in combination with the reader s expertise in his/her own field, to solving the specific data processing/analysis problems at hand. About the Homework Problems and Projects Understanding the transform methods and the corresponding computational algorithms is not all. Eventually they all need to be implemented and realized by either software or hardware, specifically by computer code of some sort. This is why the book emphasizes the algorithm and coding as well as theoretical derivation, and many homework problems and projects require certain basic coding skills, such as some knowledge in Matlab. However, being able to code is not expected of all readers. Those who may not need or wish to learn coding can by all means skip the sections in the text as well as those homework problems involving software programming. However, all readers are encouraged to at least run some of the Matlab functions provided to see the effects of the transform methods. (There are a lot of such Matlab m-files on the website of the book. In fact, all functions used to generate many of the figures in the book are provided on the site.) If a little more interested, the reader can read through the code to see how things are done. Of course a step further is to modify the code, use different parameters and different datasets to better appreciate the various effects of the algorithms.

16 xiv Preface Back to Euclid Finally let us end by again quoting Euclid, this time, a story about him. A youth who had begun to study geometry with Euclid, when he had learned the first proposition, asked, What do I get by learning these things? So Euclid called a slave and said Give him three pence, since he must make a gain out of what he learns. Surely explicit efforts are made in this book to discuss the practical uses of the orthogonal transforms as well as the mathematics behind them, but one should realize that after all the book is about a set of mathematical tools, just like those propositions in Euclid s geometry, out of learning which the reader may not be able to make a direct and immediate gain. However, in the end, it is the application of these tools toward solving specific problems in practice that will enable the reader to make a gain out of the book, much more than three pence, hopefully. Acknowledgment The author is in debt to two of his colleagues Professors John Molinder and Ellis Cumberbatch for their support and help with the book project. In addition to our discussions regarding some of the topics in the book, John provided the application example of orthogonal frequency division modulation (OFDM) discussed in section??, togetherwiththematlabcodethatisusedinahomework problem. Also Ellis read through the first two chapters of the manuscript and made numerous suggestions for the improvement of the coverage of the topics in these two chapters. All such valuable help and support are greatly appreciated.

17 Notation General notation iff if and only if j = 1=e jπ/2 imaginary unit u + jv = u jv complex conjugate of u + jv Re(u + jv)=u real part of u + jv Im(u + jv)=v imaginary part of u + jv u + jv = u 2 + v 2 magnitude (absolute value) of a complex value u + jv (u + jv)=tan 1 (v/u) phase of u + jv x n 1 an n by 1 column vector x complex conjugate of x x T transpose of x, a1bynrowvector x = x T conjugate transpose of matrix A x norm of vector x A m n an m by n matrix of m rows and n columns A complex conjugate of matrix A A 1 inverse of matrix A A T transpose of matrix A A = A T = A T conjugate transpose of matrix A N set of all positive integers including Z set of all real integers R set of all real numbers C set of all complex numbers R N N-dimensional Euclidean space C N N-dimensional unitary space L 2 space of all square-integrable functions l 2 space of all square-summable infinite vectors (sequences) x(t) a function representing a continuous signal x =[,x[n], ] T avectorrepresentingadiscretesignal ẋ(t) = dx(t)/dt first order time derivative of x(t) ẍ(t) =dx 2 /dt 2 second order time derivative of x(t) f frequency (cycle per unit time) ω =2πf angular frequency (radian per unit time) xv

18 xvi Notation Through the book, angular frequency ω will be used interchangeably with 2πf, whichever more convenient in the context of discussion. As a convention, a bold-faced lower case letter x is typically used to represent avector,whileabold-faceduppercaselettera represents a matrix, unless otherwise noted.

19 1 Signals and Systems In the first two chapters, we will consider some basic concepts and ideas as the mathematical background for the specific discussions of the various orthogonal transform methods in the subsequent chapters. Here we will set up a framework common to all such methods, so that they can be studied from a unified point of view. While some discussions here may seem mathematical, the emphasis is the intuitive understanding, instead of the theoretical rigor. 1.1 Continuous and Discrete Signals Aphysicalsignalcanalwaysberepresented as a real or complex-valued continuous function of time x(t) (unless otherwise specified,such as a function of space). The continuous signal can be sampled to become a discrete signal x[n]. If the time interval between two consecutive samples is assumed to be, then the nth sample is: x[n] =x(t) t=n = x(n ) (1.1) In either continuous or discrete case, a signal can be assumed in theory to have infinite duration, i.e., <t< for x(t) and <n< for x[n]. However, any signal in practice is finite and can be considered as the truncated version of a signal of infinite duration. We typically assume t T for a finite continuous signal x(t), and 1 n N (or sometimes n N 1forcertain convenience) for a discrete signal x[n]. The value of such a finite signal x(t) is not defined if t<ort>t,similarlyx[n] isnotdefinedifn<orn>n.however, for mathematical convenience sometimes we could assume a finite signal to be periodic, i.e., x(t + T )=x(t) andx[n + N] =x[n]. A discrete signal can also be represented as a vector x =[,x[n 1],x[n],x[n +1], ] T of finite or infinite dimensions composed of all of its samples or components as the vector elements. We will always represent a discrete signal as a column vector (transpose of a row vector). 1

20 2 Signals and Systems Figure 1.1 Sampling and reconstruction of a continuous signal Definition 1.1. The discrete unit impulse or Kronecker delta function is defined as: { 1 n = δ[n] = (1.2) n Based on this definition, a discrete signal can be represented as: x[n] = x[m]δ[n m], (n =, ±1, ±2, ) (1.3) m= This equation can be interpreted in two conceptually different ways. First, a discrete signal x[n] canbedecomposedintoasetofunitimpulses each at a different moment n = m and weighted by the signal amplitude x[m] at the moment, as shown in Fig.1.1(a). Second, the Kronecker delta δ[n m] actsasafilterthatsiftsoutaparticular value of the signal x[n]atthemomentm = n from a sequence of signal samples x[m] forallm. Thisisthesifting property of the Kronecker delta. In a similar manner, a continuous signal x(t) canalsoberepresentedbyits samples. We first define a unit square impulse function as: { 1/ t< δ (t) = (1.4) else

21 1.1 Continuous and Discrete Signals 3 Note that the width and height of this square impulse are respectively and 1/, i.e,itcoversaunitarea 1/ =1,independentofthevalueof : δ (t)dt = 1/ =1 (1.5) Now a continuous signal x(t) canbeapproximatedasasequenceofsquare impulses δ (t n ) weightedbythesamplevaluex[n] fortheamplitudeof the signal at the moment t = n : x(t) ˆx(t) = x [n] δ (t n ) (1.6) n= This is shown in Fig.1.1(b). The approximation ˆx(t) above will become a perfect reconstruction of the signal if we take the limit, so that the square impulse becomes a continuous unit impulse or Dirac delta: which is also formally defined as below: lim δ (t) =δ(t) (1.7) Definition 1.2. The continuous unit impulse or Dirac delta function δ(t) is a function that has an infinite height but zero width at t =,anditcoversaunit area, i.e., it satisfies the following two conditions: { t = δ(t) = t, and δ(t)dt =1 (1.8) Now at the limit, the summation in the approximation of Eq.1.6 above becomes an integral, the square impulse becomes a Dirac delta, and the approximation becomes a perfect reconstruction of the continuous signal: x(t) = lim x[n]δ (t n ) = x(τ)δ(t τ)dτ (1.9) n= In particular, when t =,Eq.1.9becomes: x() = x(τ)δ(τ)dτ (1.1) Eq.1.9 can be interpreted in two conceptuallydifferentways. First, a continuous signal x(t) canbedecomposedintoasetofinfinitelymany uncountable unit impulses each at a different moment t = τ, weightedbythe signal intensity x(τ) atthemomentt = τ. Second, the Dirac delta δ(τ t) actsasafilterthatsiftsoutthevalueofx(t) at the moment τ = t from a sequence of infinite uncountable signal samples. This is the sifting property of the Dirac delta.

22 4 Signals and Systems Note that the discrete impulse function δ[n] has a unit height,while the continuous impulse function δ(t) has a unit area(product of height and width for time), i.e., the two types of impulses have different dimensions. The dimension of the discrete impulse is the same as that of the signal (e.g., voltage), while the dimension of the continuous impulse is the signal s dimension divided by time (e.g., voltage/time). In other words, x(τ)δ(t τ) represents the density of the signal at t = τ, onlywhenintegratedovertimewillthecontinuousimpulse functions have the same dimension as the signal x(t). The results above indicate that a time signal, either continuous or discrete, can be decomposed in time domain to become a linear combination, either an integral or a summation, of a sequence of time impulses (or components). However, as we will see in the future chapters, the decomposition of the time signal is not unique. The signal can also be decomposed in different domains other than time, such as frequency, and the representations of the signal in different domains are related by certain orthogonal transformations. 1.2 Unit Step and Nascent Delta Functions The discrete unit step function defined below is an important function to be used frequently in the future: Definition 1.3. u[n] = { 1 n n< (1.11) The Kronecker delta can be obtained as the first order difference of the unit step function: { 1 n = δ[n] =u[n] u[n 1] = (1.12) n Similarly, in continuous case, the impulse function δ(t) is also closely related to the continuous unit step function (also called Heaviside step function) u(t). To see this, we first consider a piece-wise linear function defined as: t< u (t) = t/ t< (1.13) 1 t Taking the time derivative of this function, we get the square impulse considered before in Eq.1.4: δ (t) = d t< dt u (t) = 1/ t< (1.14) t

23 1.2 Unit Step and Nascent Delta Functions 5 Figure 1.2 Generation of unit step and unit impulse If we let approach zero, then u (t) becomestheunit step function u(t) at the limit: Definition 1.4. u(t) = lim u (t) = t< 1/2 t = 1 t> (1.15) Here we define u() = 1/2 att =forreasonstobediscussedinthefuture. 1 Also, at the limit, δ (t) becomesdiracdeltadiscussedabove: { t = δ(t) = limδ (t) = (1.16) t Therefore by taking the limit onbothsidesofeq.1.14weobtainauseful relationship between u(t) andδ(t): d u(t) =δ(t), u(t) = dt t δ(τ)dτ (1.17) In addition to the square impulse δ (t), the Dirac delta can also be generated from a variety of different nascent delta functions at the limit when a certain parameter of the function approaches the limit of either zero or infinity. Consider, for example, the Gaussian function: g(t) = 1 2πσ 2 e t2 /2σ 2 (1.18) which is the probability density function of a normally distributed random variable t with zero mean and variance σ 2.Obviouslytheareaunderneaththis density function is always one, independent of σ: g(t)dt = 1 2πσ 2 e t2 /2σ 2 dt =1 (1.19) 1 Although in some literatures it could be alternatively defined as either u() = or u() = 1.

24 6 Signals and Systems Figure 1.3 Gaussian functions with different σ values At the limit σ, this Gaussian function g(t) becomesinfinitywhent =but it is zero for all t,i.e.,itbecomestheunitimpulsefunction: lim σ 1 2πσ 2 e t2 /2σ 2 dt = δ(t) (1.2) The argument t of a Dirac delta δ(t) maybescaledsothatitbecomesδ(at). In this case Eq.1.1 becomes: x(τ)δ(aτ) dτ = x( u a )δ(u) 1 a du = 1 x() (1.21) a where we have defined u = aτ. ComparingthisresultwithEq.1.1,weseethat δ(at) = 1 δ(t), i.e. a δ(at) =δ(t) (1.22) a For example, a delta function δ(f) infrequencycanalsobeexpressedasafunction of angular frequency ω =2πf as: δ(f) =2πδ(ω) (1.23) More generally, the Dirac delta may also be defined over a function f(t), instead of a variable t, sothatitbecomeδ(f(t)), which is zero except when f(t) =, i.e., when t = t k is one of the roots of f(t) (f(t k )=).Toseehowsuch an impulse is scaled, consider the following integral: x(τ)δ(f(τ)) dτ = x(τ)δ(u) 1 f du (1.24) (τ) where we have changed the integral variable from τ to u = f(τ). If τ = τ is the only root of f(τ), i.e., u = f(τ )=,thentheintegralabovebecomes: x(τ)δ(f(τ)) dτ = x(τ ) f (τ ) If f(τ) hasmultiplerootsτ k,thenwehave: x(τ)δ(f(τ)) dτ = k x(τ k ) f (τ k ) (1.25) (1.26)

25 1.3 Relationship Between Complex Exponentials and Delta Functions 7 This is the generalized sifting property of the impulse function. We can now express the delta function as: δ(f(t)) = k δ(t t k ) f (τ k ) (1.27) which is composed of a set of impulses each corresponding to one of the roots of f(t), weighted by the reciprocal of the absolute value of the derivative of the function evaluated at the root. 1.3 Relationship Between Complex Exponentials and Delta Functions Here we list a set of six important formulas that will be used in the discussions of various forms of the Fourier transform in Chapters 3 and??.theseformulasshow that the Kronecker and Dirac delta functions can be generated as the sum or integral of some forms of the general complex exponential function e j2πft = e jωt. The proofs of these formulas are left as homework problems. I. Dirac delta as an integral of a complex exponential: e ±j2πft dt = =2 cos(2πft)dt ± j sin(2πft)dt cos(2πft)dt = δ(f) =2πδ(ω) (1.28) Note that the integral of the odd function sin(2πft)overalltime <t< is zero, while the integral of the even function cos(2πft) overalltimeis twice the integral over <t<. Eq.1.28canalsobeinterpretedintuitively. The integral of any sinusoid over all time is always zero, except if f =and e ±j2πft =1,thentheintegralbecomesinfinity.Alternatively,ifweintegrate the complex exponential with respect to frequency f, weget: e ±j2πft df =2 cos(2πft)df = δ(t) (1.29) which can also be interpreted intuitively as a superposition of uncountably infinite sinusoids with progressively higher frequency f.thesesinusoidscancel each other at any time t exceptift =andcos(2πft) =1forallf then their superposition becomes infinity. Ia. This formula is associated with Eq.1.28: e ±j2πft dt = e ±jωt dt = 1 2 δ(f) 1 j2πf = πδ(ω) 1 jω (1.3)

26 8 Signals and Systems Given the above, we can also get: e ±jωt dt = e ±jωt d( t) = e jωt dt = 1 2 δ(f) ± 1 j2πf = πδ(ω) ± 1 jω (1.31) Adding the two equations above we get the same result as given in Eq.1.28: e ±jωt dt = e ±jωt dt + e ±jωt dt = δ(f) =2πδ(ω) (1.32) II. Kronecker delta as an integral of a complex exponential: 1 e ±j2πkt/t dt = 1 cos(2πkt/t)dt ± j 1 sin(2πkt/t)dt T T T T T T = 1 cos(2πkt/t)dt = δ[k] (1.33) T In particular if T =1wehave: T 1 e ±j2πkt dt = δ[k] (1.34) III. A train of Dirac deltas with period F as a summation of a complex exponential: 1 e ±j2πfn/f = 1 F F n= = 1 cos(2πfn/f) = F n= n= k= cos(2πfn/f) ± j 1 F δ(f kf) = In particular if F =1wehave: e ±j2πfn = δ(f k) = n= k= IIIa. This formula is associated with Eq.1.36: e ±j2πfn = 1 2 n= k= δ(f k)+ Given the above, we can also get: 1 n= e ±j2πfn = = 1 2 e j2πfn 1= 1 2 n= k= k= 1 1 e ±j2πf = δ(f k) k= k= k= n= sin(2πfn/f) 2πδ(ω 2πkF) (1.35) 2πδ(ω 2πk) (1.36) δ(f k)+ πδ(ω 2πk)+ 1 1 e ±jω 1 1 e j2πf 1 (1.37) 1 1 e ±j2πf (1.38)

27 1.4 Attributes of Signals 9 Adding the two equations above we get the same result as given in Eq.1.36: n= e ±j2πfn = = 1 n= k= e ±j2πfn + n= δ(f k) =2π e ±j2πfn k= δ(ω 2πk) (1.39) IV. A train of Kronecker deltas with period N as a summation of complex exponential: N 1 1 e ±j2πnm/n = 1 N N n= = 1 N N 1 n= N 1 n= cos(2πnm/n) ± j N cos(2πnm/n) = N 1 n= k= sin(2πnm/n) δ[m kn] (1.4) 1.4 Attributes of Signals Atimesignalcanbecharacterizedbythefollowingparameters: The Energy contained in a continuous signal x(t) is: or in a discrete signal x[n]: E = E = n= x(t) 2 dt (1.41) x[n] 2 (1.42) Note that x(t) 2 and x[n] 2 have different dimensions and they represent respectively the power and energy of the signal at the corresponding moment. If the energy contained in a signal is finite E <, then the signal is called an energy signal.acontinuousenergysignalissaidtobesquare-integrable, and a discrete energy signal is said to be square-summable. Allsignalstobe considered in the future, either continuous or discrete, will be assumed to be energy signals. The average power of the signal: or for a discrete signal: 1 T P = lim x(t) 2 dt (1.43) T T 1 P = lim N N N x[n] 2 (1.44) n=1

28 1 Signals and Systems If E of x(t) isnotfinitebutp is, x(t) isapower signal. Obviouslytheaverage power of an energy signal is zero. The cross-correlation defined below measures the similarity between two signals as a function of the relative time shift: r xy (τ) =x(t) y(t) = x(t) y(t τ)dt = x(t + τ) y(t)dt x(t τ) y(t)dt = y(t) x(t) =r yx (τ) (1.45) Note that the cross-correlation is not commutative. For discrete signal, we have r xy [m] =x[n] y[n] = x[n] y[n m] = x[n + m] y[n] (1.46) n= n= In particular, when x(t) = y(t) andx[n] = y[n], the cross-correlation becomes the autocorrelation which measures the self-similarity of the signal: and r x (τ) = r x [m] = n= x(t)x(t τ)dt = x[n] x[n m] = n= More particularly when τ =andm =wehave: r x () = x(t) 2 dt, r x [] = x(t + τ)x(t)dt (1.47) x[n + m] x[n] (1.48) n= x[n] 2 dt (1.49) which represent the total energy contained in the signal. Arandomtimesignalx(t) iscalledastochastic process.itsmeanorexpectation (Appendix B) is: µ x (t) =E[x(t)] (1.5) The cross-covariance of two stochastic processes x(t) andy(t) is: Cov xy (t, τ) =σ 2 xy (t, τ) =E[(x(t) µ x(t)) (y(τ) µ y (τ))] = E[x(t)y(τ)] µ x (t)µ y (τ) (1.51) Astochasticprocessx(t) canbetruncatedandsampledtobecomearandom vector x =[x[1],,x[n]] T.Themeanorexpectationofx is a vector: µ x = E[x] (1.52) with the nth element of µ being µ[n] = E[x[n]]. The cross-covariance of x and y is an N by N matrix: Σ xy = E[(x µ x )(y µ y ) ]=E[xy ] µ x µ y (1.53)

29 1.5 Signal Arithmetics and Transformations 11 with the mn-th element being: σ 2 xy [m, n] =E[(x[m] µ x[m])(y[n] µ y [n])] = E[x[m]y[n]] µ x [m]µ y [n] (1.54) In particular, when x(t) = y(t) and x[n] = y[n], the cross-covariance becomes autocovariance: and Cov x (t, τ) =σ 2 x (t, τ) =E[(x(t) µ x(t)) (x(τ) µ x (τ))] = E[x(t)x(τ)] µ x (t)µ x (τ) (1.55) Σ x = E[(x µ x )(x µ x ) ]=E[xx ] µ x µ x (1.56) More particularly, when t = τ and m = n we have: σ 2 x(t) =E[ x(t) 2 ] µ x (t) 2, σ 2 x[n] =E[ x[n] 2 ] µ x [n] 2 (1.57) We see that σx 2 (t) representstheaveragedynamicpowerofthesignalx(t), and σx 2 [n] representstheaveragedynamicenergycontainedinthenthsignal component x[n]. 1.5 Signal Arithmetics and Transformations Any of the arithmetic operations (addition/subtraction and multiplication/division) can be applied to two continuous signal x(t) and y(t), or two discrete signals x[n] andy[n] toproduceanewsignalz(t) orz[n]: Scaling: z(t) =ax(t) orz[n] =ax[n] Addition/subtraction: z(t) =x(t) ± y(t) orz[n] =x[n] ± y[n]; Multiplication: z(t) =x(t)y(t) orz[n] =x[n]y[n] Division: z(t) =x(t)/y(t) orz[n] =x[n]/y[n] Note that these operations are actually applied to the amplitude values of the two signals x(t) and y(t) at each moment t, and the result becomes the value of z(t) atthesamemoment,andthesameistruefortheoperationsonthediscrete signals. Moreover, a linear transformation in the general form of y = ax + b = a(x + b/a) canbeappliedtotheamplitudeofafunctionx(t) (verticalintimeplot)in two steps: Translation: y(t) =x(t)+x :thetimefunctionx(t) ismovedeitherupwardifx > or downward if x <. Scaling: y(t) =ax(t): the time function x(t)iseitherup-scaledif a > 1ordown-scaled if a < 1. x(t) isalsoflippedvertically(upside-down)ifa<.

30 12 Signals and Systems The same linear transformation y = ax + b can also be applied to the time argument t of the function x(t) (horizontalintimeplot)aswellastoitsamplitude: τ = at + t = a(t + t /a), y(τ) =x(at + t )=x(a(t + t /a)) (1.58) Translation or shift: y(t) =x(t + t )istranslatedby t either to the right if t <, or to the left if t >. Scaling: y(t) =x(at) iseithercompressedif a > 1, or expanded if a < 1. The signal is also reversed (flipped horizontally) in time if a<. In general, the transformation in time y(t) =x(at + t )=x(a(t + t /a)) containing both translation and scaling can be carried out in either of the two methods. 1. A two-step process: Step 1: define an intermediate signal z(t) =x(t + t )duetotranslation; Step 2: find the transformed signal y(t) = z(at) due to time-scaling(containing time reversal if a<); The two steps can be carried out equivalently in reverse order: Step 1: define an intermediate signal z(t) = x(at) due to time-scaling(containing time reversal if a<); Step 2: find the transformed signal y(t) =z(t + t /a) duetotranslation; However, note that the translation parameters (direction and amount) are different depending on whether it is carried before or after scaling. 2. A two-point process: Evaluate x(t) attwoarbitrarilychosentimepointst = t 1 and t = t 2 to get v 1 = x(t 1 )andv 2 = x(t 2 ). Then y(t) =x(at + t )=v 1 when its argument is at + t = t 1, i.e., when t =(t 1 t )/a, andy(t) =x(at + t )=v 2 when at + t = t 1,i.e.,t =(t 2 t )/a. Asthetransformationat + t is linear, the value of y(t) at any other time moment t can be found by linear interpolation based on these two points. Example 1.1: Consider the transformation of a time signal: { t <t<2 x(t) = (1.59) else Translation: y(t) =x(t +3)andz(t) =x(t 1) are shown in Fig.1.4(a). Expansion/compression: y(t) =x(2t/3) and z(t) =x(2t) are shown in Fig.1.4(b). Time reversal: y(t) =x( t) andz(t) =x( 2t) are shown in Fig.1.4c. Combination of translation, scaling and reversal: y(t) =x( 2t +3)=x( 2(t 3 )) (1.6) 2

31 1.5 Signal Arithmetics and Transformations 13 Figure 1.4 Transformation of continuous signal Method 1: based on the first expression y(t) =x( 2t +3)we get(fig.1.4 (d)): z(t) =x(t +3), y(t) =z( 2t) (1.61) Alternatively, based on the second expression y(t) =x( 2(t 3/2)) we get (Fig.1.4 (e)): z(t) =x( 2t), y(t) =z(t 3 2 ) (1.62) Method 2: the signal has two break points at t 1 =andt 2 =2,correspondingly, the two break points for y(t) canbefoundtobe: 2t +3=t 1 = = t = 3 2 2t +3=t 2 =2 = t = 1 2 By linear interpolation based on these two points, the entire signal y(t) can be easily obtained, same as that obtained by the previous method shown in Fig.1.4(d) and (e).

32 14 Signals and Systems Figure 1.5 Transformation of discrete signal In the transformation of discrete signals, the expansion and compression for continuous signals are replaced respectively by up-sampling and down-sampling. Down-sampling (decimation): Keep every Nth sample and discard the rest. Signal size becomes 1/N of the original one. x (N) [n] =x[nn] (1.63) For example, if N =3,x (3) [] = x[], x (3) [1] = x[3], x (3) [2] = x[6], Up-sampling (interpolation by zero stuffing): Insert N 1zerosbetweeneverytwoconsecutivesamplesx[n] andx[n +1]. Signal size becomes N times the original one. { x[n/n] n =, ±N,±2N, x (N) [n] = (1.64) else For example, if N =2, x (2) [] = x[], x (2) [2] = x[1], x (2) [4] = x[2],,and x[n] =forallothern. Example 1.2: Given x[n] as shown in Fig.1.5(a),a transformation y[n] = x[ n + 4], shown in Fig.1.5(b), can be obtained based on two time points: n +4= = n =4 n +4=3 = n =1 (1.65)

33 1.6 Linear and Time Invariant Systems 15 The up and down sampling of the signal in Fig.1.5(a) can be obtained in the following table and shown in Fig.1.5(c) and (d), respectively. n x[n] x (2) [n] 1 3 x (2) [n] (1.66) 1.6 Linear and Time Invariant Systems Agenericsystem(electrical,mechanical,biological,economical, etc.) can be symbolically represented in terms of the relationship between its input x(t)(stimulus, excitation) and output y(t) (response, reaction): O[x(t)] = y(t) (1.67) where the symbol O[ ] represents the operation applied by the system to its input. A system is linear if its input-output relationship satisfies both homogeneity and superposition. Homogeneity: O [ax(t)] = ao[x(t)] = ay(t) (1.68) Superposition: If O[xn (t)] = y n (t) (n =1, 2,,N), then: [ N ] N N O x n (t) = O[x n (t)] = y n (t) (1.69) or n=1 n=1 n=1 [ ] O x(τ)dτ = O[x(τ)]dτ = y(τ)dτ (1.7) Combining these two properties, we have [ N ] N O a n x n (t) = a n O[x n (t)] = or n=1 n=1 N a n y n (t) (1.71) [ ] O a(τ)x(τ)dτ = a(τ)o[x(τ)]dτ = a(τ)y(τ)dτ (1.72) Asystemistime-invariant if how it responds to the input does not change over time. In other words: n=1 if O[x(t)] = y(t), then O[x(t τ)] = y(t τ) (1.73) A linear and time-invariant (LTI) system is both linear and time-invariant.

34 16 Signals and Systems As an example, we see that the response of an LTI system y(t) =O[x(t)] to dx(t)/dt is dy(t)/dt: Taking the limit, we get: [ ] d O dt x(t) O[ 1 [x(t + ) x(t)] = 1 [y(t + t) y(t)] (1.74) = O[ẋ(t)] = d y(t) =ẏ(t) (1.75) dt Example 1.3: Determine if each of the following systems is linear. The input x(t) isthevoltageacrossaresistorr and the output y(t) isthe current through R: y(t) =O[x(t)] = x(t) (1.76) R This is obviously a linear system. The input x(t) isthevoltageacrossaresistorr and the output y(t) isthe power consumed by R: y(t) =O[x(t)] = x2 (t) (1.77) R This is not a linear system. The input x(t) isthevoltageacrossaresistorr and a capacitor C in series and the output is the voltage across C: RC d dt y(t)+y(t) =τ d y(t)+y(t) =x(t) (1.78) dt where τ = RC is the time constant of the system. As the system is characterized by a linear, first order ordinary differential equation (ODE), it is linear. Asystemproducesitsoutputy(t) byaddingaconstanta to its input x(t): Consider: y(t) = O[x(t)] = x(t) + a (1.79) O[x 1 (t)+x 2 (t)] = x 1 (t)+x 2 (t)+a O[x 1 (t)] + O[x 2 (t)] = x 1 (t)+x 2 (t)+2a (1.8) This is not a linear system. The input x(t) istheforcef applied to a spring of length l and spring constant k, theoutputisthelengthofthespring.accordingtohooke slaw, l = kf = kx(t), we have This is not a linear system. y(t) =l = l + l = l kx(t) (1.81)

35 1.7 Signals Through LTI Systems (Continuous) 17 Figure 1.6 Response of a continuous LTI system Same as above except the output y(t) =l l = l is the displacement of the moving end of the spring: y(t) = l = kf = kx(t) (1.82) This system is linear. 1.7 Signals Through LTI Systems (Continuous) If the input to an LTI system is an impulse x(t) =δ(t)att =,thentheresponse of the system, called the impulse response function, is h(t) =O[δ(t)] (1.83) We now shown that given the impulse response h(t) ofanltisystem,wecan find its response to any input x(t). First, according to Eq. 1.9, we can express the input as x(t) = Next, according to Eqs.1.7 and 1.73, we have [ ] y(t) =O[x(t)] = O x(τ)δ(t τ)dτ = x(τ)o[δ(t τ)]dτ = x(τ)δ(t τ)dτ (1.84) x(τ)h(t τ)]dτ (1.85) This process is illustrated in Fig.1.6. The integration on the right hand side above is called the continuous convolution, whichisgenerallydefinedasanoperation of two continuous functions x(t) andy(t): z(t) =x(t) y(t) = x(τ)y(t τ)dτ = y(τ)x(t τ)dτ = y(t) x(t) Note that convolution is commutative, i.e., x(t) y(t) = y(t) x(t). (1.86)

36 18 Signals and Systems In particular, if the input to an LTI system is a complex exponential function: x(t) =e st = e (σ+jω)t =[cos(ωt)+j sin(ωt)]e σt (1.87) where s = σ + jω is a complex parameter, the corresponding output is y(t) =O[e st ]= h(τ)e s(t τ ) dτ = e st h(τ)e sτ dτ = H(s)e st (1.88) where H(s) isaconstant(independentofthetimevariablet) definedas H(s) = h(τ)e sτ dτ (1.89) This is called the transfer function (TF) of the continuous LTI system, which is the Laplace transform of the impulse response function h(t) of the system,to be discussed in Chapter??. WecanrewriteEq.1.88asaneigenequation: O[e st ]=H(s)e st (1.9) where the constant H(s) and the complex exponential e st are, respectively, the eigenvalue and the corresponding eigenfunction of the LTI system, i.e., the response of the system to the complex exponential input e st is equal to the input multiplied by a constant H(s). Also note that the complex exponential e st is the eigenfunction of any continuous LTI system, independent of its specific impulse response h(t). In particular, when s = jω = j2πf (σ =),H(s) becomes: H(jω)= h(τ)e j2πfτ dτ = h(τ)e jωτ dτ (1.91) This is the frequency response function (FRF) ofthe system, whichis the Fourier transform of the impulse response function h(t), to be discussed in Chapter 3. Different notations such as H(f) and(ω) can also be used for the FRF as a function of frequency f or ω = 2πf in various literatures, depending on the convention adopted by the authors. We may use any of these depending on the context and convention adopted by the authors. Given the FRF H(jω) ofasystem,itsresponsetoaninputx(t) =e jω t with aspecificfrequencyω =2πf can be found by evaluating Eq.1.88 at s = jω : y(t) =O[e jω t ]=H(jω )e jω t (1.92) Moreover, if the input x(t) canbewrittenasalinearcombinationofasetof complex exponentials: x(t) = X[k]e jkω t (1.93) k=

37 1.7 Signals Through LTI Systems (Continuous) 19 where X[k] istheweightingcoefficientforthekthcomplexexponentialoffrequency kω,then,duetothelinearityofthesystem,itsoutputis: y(t) =O[x(t)] = O[ X[k]e jkωt ]= X[k]O[e jkωt ] = k= k= X[k]H(jkω )e jkω t = k= k= Y [k]e jkω t (1.94) where Y [k] =X[k]H(jkω )isthekthcoefficientfortheoutput.theresultcan be generalized to cover signals composed of infinite uncountable complex exponentials: x(t) = X(f)e j2πft df (1.95) where X(f) istheweightingfunctionforallexponentialswithfrequencyinthe range of <f<, thenitsoutputis: [ ] y(t) =O[x(t)] = O X(f)e j2πft df = X(f)O[e j2πft ]df = X(f)H(f)e j2πft df = Y (f)e j2πft df (1.96) where Y (f) =X(f)H(f) istheweightingfunctionfortheoutput. The results above are of great significance, as they indicate that we can obtain the response of an LTI system described by its transfer function H(s) or equivalently its impulse response function h(t) to any input x(t) in the form of a linear combination of a set of complex exponentials. This is also an important conclusion of the Fourier transform theory to be considered in Chapter 3. An LTI system is stable if its response to any bounded input is also bounded: if x(t) <B x then y(t) <B y (1.97) As the input and output of an LTI are related by convolution we have: y(t) = y(t) =h(t) x(t) = which obviously requires: h(τ)x(t τ)dτ (1.98) h(τ)x(t τ)dτ h(τ) x(t τ) dτ <B x h(τ) dτ < B y (1.99) h(τ) dτ < (1.1) In other words, if the impulse response function h(t) of an LTI system is absolutely integrable, then the system is stable, i.e., Eq.1.1 is the sufficient con-

38 2 Signals and Systems dition for an LTI system to be stable. We can show that this condition is also necessary, i.e., all stable LTI systems impulse response functions are absolutely integrable. An LTI system is causal if its output y(t) onlydependsonthecurrentand past input x(t) (butnotthefuture).ifthesystemisinitiallyatrestwithzero output y(t) = fort<, then its response y(t) = h(t) to an impulse x(t) = δ(t) at moment t =willbeatrestbeforethemomentt =, i.e.,h(t) =h(t)u(t). Its response to a general input x(t) is: y(t) =h(t) x(t) = h(τ)x(t τ)dτ = h(τ)x(t τ)dτ (1.11) Moreover, if the input begins at a specific moment, e.g., t =, i.e., x(t) = x(t)u(t) and x(t τ) =forτ>t,thenwehave y(t) =h(t) x(t) = h(τ)x(t τ)dτ = t h(τ)x(t τ)dτ (1.12) 1.8 Signals Through LTI Systems (Discrete) Similar to the above discussion for continuous signals and systems, the following results can be obtained for discrete signals and systems. First, as shown in Eq.1.3, any discrete signal can be written as: x[n] = x[m]δ[n m] (1.13) m= Let the impulse response of a discrete LTI system be then its response to the signal x[n] is: [ ] y[n] =O[x[n]] = O x[m]δ[n m] = = m= m= x[m]h[n m] = h[n] =O[δ[n]] (1.14) m= m= x[m]o[δ[n m]] x[n m]h[m] (1.15) This process is illustrated in Fig.1.7. The last summation in Eq.1.15 is defined called the discrete convolution, which is generally defined as an operation of two discrete functions x[n] and h[n]: z[n] =x[n] y[n] = x[m]y[n m] = y[m]x[n m] =y[n] x[n] m= m= (1.16)

39 1.8 Signals Through LTI Systems (Discrete) 21 Figure 1.7 Response of a discrete LTI system Note that convolution is commutative, i.e., x[n] y[n] = y[n] x[n]. Similar to the continuous case, if the system is causal and the input x[n] iszerountil n =,wehave: n n y[n] = x[m]h[n m] = x[n m]h[m] (1.17) m= m= In particular, if the input to an LTI system is a complex exponential function: x[n] =e sn =(e s ) n = z n (1.18) where s = σ + jω as defined above, and z is defined as z = e s,thenaccordingto Eq.1.15, the corresponding output is y[n] =O[z n ]= h[k]z n k = z n h[k]z k = H(z)z n (1.19) k= k= where H(z) isaconstant(independentofthetimevariablen) definedas H(z) = h[k]z k (1.11) k= This is called the transfer function (TF) of the discrete LTI system, which is the Z-transform of the impulse response h[n] ofthesystem,tobediscussedinchapter??. WenotethatEq.1.19isaneigenequation,wheretheconstantH(z) and the complex exponential z n are, respectively, the eigenvalue and the corresponding eigenfunction of the LTI system. Also note that the complex exponential z n is the eigenfunction of any discrete LTI system, independent of its specific impulse response h[n]. In particular, when s = jω (σ =)andz = e s = e jω, H(z) becomes: H(e jω )= h[k]e j2kπf = h[k]e jkω (1.111) k= k= This is the frequency response function (FRF) ofthe system, whichis the Fourier transform of the discrete impulse response function h[n], to be discussed in Chapter??. As in the continuous case,different notations such as H(f) and(ω) can

40 22 Signals and Systems also be used for the FRF as a function of frequency f or ω = 2πf in various literatures, depending on the convention adopted by the authors. Given H(e jω )ofadiscretesystem,itsresponsetoadiscreteinputx[n] =z n = e jω n with a specific frequency ω =2πf can be found to by evaluating Eq.1.19 at z = e jω : y[n] =O[e jω n ]=H(e jω )e jω n (1.112) Moreover, if the input x[n] can be written as a linear combination of a set of complex exponentials: x[n] = N 1 k= X[k]e jkω n/n (1.113) where X[k] ( k<n)areasetofconstantcoefficients,then,duetothelinearity of the system, its output is: N 1 y[n] =O[x[n]] = O[ = N 1 k= k= X[k]e jkω n/n ]= X[k]H(e jkω )e jkω n = N 1 k= N 1 k= Y [k]e jkω n X[k]O[e jkω n ] (1.114) where Y [k] =X[k]H(e jkω )isthekthcoefficientoftheoutput.theresultcan be generalized to cover signals composed of uncountably infinite complex exponentials: x[n] = F X(f)e j2πfn/f df (1.115) where X(f) istheweightingfunctionforallexponentialswithfrequenciesinthe range of <f<f,thenitsoutputis: [ F ] F y[n] =O[x[n]] = O X(f)e j2πfn/f df = X(f)O[e j2πfn/f ]df = F X(f)H(e j2πfn/f )e j2πfn/f df = F Y (f)e j2πfn/f df (1.116) where Y (f) =X(f)H(e j2πfn/f )istheweightingfunctionfortheoutput. The significance of this result is that we can obtain the response of a discrete LTI system described by H(z) or equivalently h[k] to any input x[n] in the form of a linear combination of a set of complex exponentials. This is an important conclusion of the discrete-time Fourier transformtheorytobeconsideredin Chapter??. Similar to a stable continuous LTI system, a stable discrete LTI system s response to any bounded input is also bounded for all n: if x[n] <B x then y[n] <B y (1.117)

41 1.9 Continuous and Discrete Convolutions 23 As the output and input of an LTI is related by convolution y[n] =h[n] x[n] = h[m]x[n m] (1.118) we have: y[n] = m= <B x which obviously requires: m= m= h[m]x[n m] m= h[m] x[n n] h[m] dτ < B y (1.119) m= h[m] < (1.12) In other words, if the impulse response function h[n] ofanltisystemisabso- lutely summable, then the system is stable, i.e., Eq.1.12 is the sufficient condition for an LTI system to be stable. We can show that this condition is also necessary, i.e., all stable LTI systems impulse response functions are absolutely summable. Also, a discrete LTI system is causal if its output y[n] onlydependsonthe current and past input x[n] (butnotthefuture).assumingthesystemisinitially at rest with zero output y[n] =forn<, then its response y[n] =h[n] toan impulse x[n] =δ[n] atmomentn =willbeatrestbeforethemomentn =, i.e., h[n] = h[n]u[n]. Its response to a general input x[n] is: y[n] =h[n] x[n] = h[m]x[n m] = h[m]x[n m] (1.121) m= Moreover, if the input begins at a specific moment, e.g., n =, i.e., x[n] = x[n]u[n] andx[n m] =form > n,thenwehave n y[n] =h[n] x[n] = h[m]x[n m] = h[m]x[n m] (1.122) m= m= m= 1.9 Continuous and Discrete Convolutions The continuous and discrete convolutions defined respectively in Eqs.1.86 and 1.16 are of great importance in the future discussions. Here we further consider how these convolutions can be specifically carried out. First we reconsider the continuous convolution z(t) =x(t) y(t) = x(τ)y(t τ)dτ (1.123) which can be carried out conceptually in the following three steps:

42 24 Signals and Systems 1. Find the time reversal of one of the two functions, say, y(τ), by flipping it in time to get y( τ); 2. Slide this flipped function along the τ axis to get y(t τ) astheshiftamount t goes from to ; 3. For each shift amount t, findtheintegralofx(τ)y(t τ) overallτ, thearea of overlap between x(τ) andy(t τ), which is the convolution z(t) att. This process is illustrated in the following example and in Fig.1.8. Example 1.4: Let x(t) =u(t) be the input to an LTI system with impulse response function h(t) =e at u(t) (afirstordersystemtobeconsideredinexample??), the output y(t) ofthesystemis: y(t) =h(t) x(t) = t h(t τ)dτ = t e a(t τ ) dτ = 1 a e at e aτ t = 1 a e at (e at 1) = 1 a (1 e at ), (t >) (1.124) The result can be written as h(t) = 1 a (1 e at )u(t) asitiszerowhent<. Alternatively, the convolution can also be written as: y(t) =x(t) h(t) = h(τ)x(t τ)dτ = t h(τ)dτ = t e aτ dτ = 1 t a e aτ = 1 a (1 e at )u(t) (1.125) Moreover, if the input is { 1 t<τ x(t) =u(t) u(t τ) = else Due to the previous result and the linearity of the system, its output is: y(t) =h(t) [u(t) u(t τ)] = h(t) u(t) h(t) u(t τ) (1.126) = 1 a [(1 e at )u(t) (1 e a(t τ ) )u(t τ)] (1.127) This result is shown in Fig.1.9 Although convolution and cross-correlation (Eq.1.45) are two different operations, they look similar and are closely related. If we flip one of the two functions in a convolution, it becomes the same as the cross correlation. x(t) y( t) = x(τ)y(τ t)dτ = r xy (t) =x(t) y(t) (1.128) In other words, if one of the signals y(t) =y( t) iseven,thenwehavex(t) y(t) =x(t) y(t).

43 1.9 Continuous and Discrete Convolutions 25 Figure 1.8 The convolution of two functions The three steps are shown top-down, then left to right. The shaded area represents the convolution evaluated at a specific time moment such as t = t 2, t = t 3, and t = t 4. Figure 1.9 The linearity of convolution Given y 1 (t) =h(t) u(t) and y 2 (t) =h(t) u(t τ), then h(t) [u(t) u(t τ)] = y 1 (t) y 2 (t). Example 1.5: Let x(t) =e at u(t) and y(t) =e bt u(t), and both a and b are positive. We first find their convolution: x(t) y(t) = x(τ)y(t τ)dτ (1.129)

44 26 Signals and Systems As y(t τ) canbewrittenas: we have y(t τ) =e b(t τ ) u(t τ) = t { e b(t τ ) τ<t τ>t (1.13) t x(t) y(t) = e at e b(t τ ) dτ = e bt e (a b)τ dτ = 1 a b (e bt e at ) = 1 b a (e at e bt )=y(t) x(t) Next we find the cross-correlation x(t) y(t): x(t) y(t) = Consider two cases: When t>, the above becomes: t e aτ e b(τ t) dτ = e bt When t<, the above becomes: e aτ e b(τ t) dτ = e bt e (a+b)τ dτ = Combining these two cases, we have: x(t) y(t) = 1 a + b t x(τ)y(τ t)dτ (1.131) e (a+b)τ dτ = e at u(t) (1.132) a + b { e at u(t) t> e bt u( t) t< ebt u( t) (1.133) a + b (1.134) Example 1.6: Let x[n] =u[n] betheinputtoadiscreteltisystemwithimpulse response h[n] =a n u[n] ( a < 1), the output y[n] isthefollowingconvolution (illustrated in Fig.1.1): n y[n] =h[n] x[n] = y[m]x[n m] = y[m] = n m= m= a m = 1 an+1 1 a Here we have used the geometric series formula: N n= x n = 1 xn+1 1 x m= (1.135) (1.136)

45 1.1 Homework Problems 27 Figure 1.1 Discrete convolution Alternatively, the convolution can also be written as: n x[n] y[n] = x[m]y[n m] = y[n m] m= = a n n m= m= a m = a n 1 a (n+1) 1 a 1 = 1 an+1 1 a If a =1/2, then the output y[n] is[,, 1, 3/2, 7/4, 15/8, ], and when n, y[n] 1/(1 a) =2,asshowninthebottompanelofFig Homework Problems 1. Given two square impulses as shown below: { { 1 t <a/2 1 t <b/2 r a (t) =, r b (t) = else else (1.137) where we assume b>a,findtheirconvolutionx(t) =r a (t) r b (t)inanalytical form (piecewise functions, i.e., one expression for one particular time interval) as well as graphic form. 2. Given a triangle wave, an isosceles triangle, as shown below: { 1+t/a a <t< s a (t) = (1.138) 1 t/a <t<a Find the convolution s a (t) s a (t) inanalyticalform(piecewisefunction)as well as graphic form. 3. Prove the identity in Eq.1.28: Hint: Follow these steps: e ±j2πft dt = δ(f) (1.139)

46 28 Signals and Systems a. Change the lower and upper integral limits to a/2 anda/2, respectively, and show that this definite integral results in a sinc function asinc(af) of frequency f with a parameter a. Asincfunctionisdefinedassinc(x) = sin(πx)/πx, andlim x sinc(x) =1. b. Show that the following integral of this sinc function asinc(af) is1(independent of a): a sinc(af)df =1 (1.14) based on the integral formula: sin(x) dx = π x 2 c. Let a and show that asinc(af) approachesaunitimpulse: 4. Prove the identity in Eq.1.3: (1.141) lim s(f,a) =δ(f) (1.142) a e ±j2πft dt = 1 2 δ(f) 1 j2πf = πδ(ω) 1 jω (1.143) Hint: Following these steps: a. Introduce an extra term e at with a real parameter a>sothattheintegrand becomes e (a+jω)t and the integral can be carried out. Note that we cannot take the limit a fortheintegralresultduetothesingularityat f =. b. Take the limit a ontheimaginarypart,whichisoddwithoutsingularity at f =. c. Take the limit on the real part, which is even with a singularity at f =. However, show this impulse is one half of Dirac delta as its integral over <f< is 1/2. You may need to use this integral: 5. Prove the identity in Eq.1.33: 1 T 1 a 2 + x 2 dx = 1 ( x a tan 1 a T ) (1.144) e ±j2πkt/t dt = δ[k] (1.145) Hint: Use Euler s formula to represent the integrand as: ( ) ( ) 2πt 2πt e ±j2πkt/t =cos ± j sin T/k T/k 6. Prove the identity in Eq.1.35: 1 e ±j2kπf/f = F Hint: Follow these steps: k= n= (1.146) δ(f nf ) (1.147)

47 1.1 Homework Problems 29 a. Find the summation of the following series: (ae x ) k = k= (ae x ) k + k= (ae x ) k 1= k= (ae x ) k + (ae x ) k 1 (1.148) based on the power series formula for a < 1: (ae x ) k 1 = 1 ae x (1.149) k= b. Show that when a =1thesumaboveiszeroiff nf but infinity when f = nf,foranyintegern, i.e.,thesumisatrainofimpulses. c. Show that each impulse is a Dirac delta, a unit impulse, as its integral over the period of F with respect to f is 1. Here the result of the previous problem may be needed. 7. Prove the identity in Eq.1.37: m= e j2πfm = 1 2 k= δ(f k)+ k= 1 1 e j2πf = k= k= πδ(ω 2kπ)+ 1 1 e jω (1.15) Hint: Following these steps: a. Introduce an extra term a n with a real parameter <a<1sothatthe summation term becomes (ae jω ) n and the summation can be carried out. Note that we cannot take the limit a 1directlyontheresultduetothe singularity at f = k (ω =2kπ) foranyintegervalueofk. b. Take the limit a ontheimaginarypart,whichisoddwithoutsingularity at f = k. c. Take the limit on the real part, which is even with a singularity at f = k. However, show each impulse is one half of Dirac delta as its integral over 1/2 <f k<1/2 is1/2. You may need to use this integral: [ dx a 2 + b 2 2ab cos x = 2 a + b ( x ) ] a 2 b 2 tan 1 a b tan (1.151) 2 8. Prove the identity in Eq.1.4: N 1 1 e ±j2πnm/n = N n= k= δ[m kn] (1.152) Hint: Consider the summation on the left-hand side in the following two cases to show that: a. If m = kn for any integer value of k, thesummationis1; b. If m kn, thesummationis,basedontheformulaofgeometricseries: N 1 n= x n = 1 xn 1 x 9. Consider the three signals x(t), y(t) andz(t) infig (1.153)

48 3 Signals and Systems Figure 1.11 Orthogonal Projection Figure 1.12 Impulse and input of an LTI system Figure 1.13 Impulse and input of an LTI system Give the expressions for y(t) intermsofx(t). Give the expressions for z(t) intermsofx(t). Give the expressions for y(t) intermsofz(t). Give the expressions for z(t) intermsofy(t). 1. Let x =[1, 1, 1, 1, 1, 1, 1, 1] T be the input to an LTI system with impulse response h =[1, 2, 3] T.Findtheoutputy[n] =h[n] x[n]. Write a Matlab program to Confirm your result. Note that given the input x[n] andthecorrespondingoutputy[n], it is difficult to find h[n], similarly, given the output y[n] andtheimpulseresponseh[n], it is also difficult to find the input x[n]. As we will see later, such difficulties can be resolved by the Fourier transform method in frequency domain. 11. The impulse response h(t) ofanltisystemisshowninfig.1.13,andthe δ(t kt). Draw the system s response y(t) = h(t) x(t) whent takes each of the these values: T =2, T =1, T =2/3, T =1/2, and T =1/ The impulse response of an LTI system is input signal is x(t) = k= { 1 <t<t h(t) = else (1.154)

49 1.1 Homework Problems 31 Find the response of the system to an input x(t) =cos(2πft), and then write amatlabprogramtoconfirmyourresult. 13. The impulse response of a discrete LTI system is h[n] =a n u[n] with a < 1 and the input is x[n] =cos(2πnf ). Find the corresponding output y[n] = h[n] x[n]. Hint: when needed, any complex expression (such as 1/(1 ae j2πf )) can be represented in polar form re jθ.butthemagnituder and angle θ need to be expressed in terms of the given parameters (such as a and f ).

50 2 Vector Spaces and Signal Representation In this chapter we discuss some basic concepts of Hilber space and the related operations and properties as the mathematical foundation for the topics of the subsequent chapters. Specifically, based on the concept of unitary transformation in a Hilbert space, all of the unitary transform methods to be specifically considered in the following chapters can betreatedfromaunifiedpointofview: they are just a set of different rotations of the standard basis of the Hilber space in which a given signal, as a vector, resides. By such a rotation the signal can be better represented in the sense that the variouis signal processing needs, such as noise filtering, information extraction and data compression, can all be carried out more effectively and efficiently. 2.1 Inner Product Space Vector Space In our future discussion, any signal, either a continuous one represented as a time function x(t), or a discrete one represented as a vector x =[,x[n], ] T, will be considered as a vector in a vector space, whichisjustageneralizationof the familiar concept of N-dimensional (N-D) space, formally defined as below. Definition 2.1. A vector space is a set V with two operations of addition and scalar multiplication defined for its members, referred to as vectors. 1. Vector addition maps any two vectors x, y V to another vector x + y V satisfying the following properties: Commutativity: x + y = y + x Associativity: x +(y + z) =(x + y)+z Existence of zero: there is a vector V such that: + x = x + = x Existence of inverse: for any vector x V,thereisanothervector x V such that x +( x) = 2. Scalar multiplication maps a vector x V and a real or complex scalar a C to another vector ax V with the following properties: a(x + y) =ax + ay (a + b)x = ax + bx 32

51 2.1 Inner Product Space 33 abx = a(bx) 1x = x Listed below is a set of typical vector spaces for various types of signals of interest. N-dimensional vector space R N or C N This space contains all N-dimensional (N-D) vectors expressed as an N-tuple, an ordered list of N elements (or components): x[1] x[2] x =. =[x[1],x[2],,x[n]]t (2.1) x[n] which can be used to represent a discrete signal containing N samples. We will always represent a vector as a column vector, or the transpose of a row vector. The space is denoted by either C N if the elements are complex x[n] C, or R N if they are all real x[n] R (n =1,,N). Sometimes the N elements of a vector can be alternatively indexed by n =,,N 1togainsome convenience, as can be seen in future chapters. AvectorspacecanbedefinedtocontainallM N matrices composed of N M-D column vectors: x[1, 1] x[1, 2] x[1, N] x[2, 1] x[2, 2] x[2, N] X =[x 1,, x N ]=. (2.2)..... x[m,1] x[m,2] x[m,n] where the nth column is an M-D vector x n =[x[1,n],,x[m,n]] T.Sucha matrix can be converted to an MN-D vector by cascading all of the column (or row) vectors. A matrix X can be used to represent a 2-D signal, such as an image. l 2 space: The dimension N of R N or C N can be extended to infinity so that a vector in the space becomes a sequence x =[,x[n], ] T for n< or < n<. Ifallvectorsaresquare-summable,thespaceisdenotedbyl 2.All discrete energy signal are vectors in l 2. L 2 space: Avectorspacecanalsobeasetofrealorcomplexvaluedcontinuousfunctions x(t) definedovereitherafiniterangesuchas t<t,oraninfiniterange <t<. Ifallfunctionsaresquare-integrable,thespaceisdenotedby L 2.AllcontinuousenergysignalsarevectorsinL 2. Note that the term vector, generally denoted by x, may be interpreted in two different ways. First, in the most general sense, it represents a member of a

52 34 Vector Spaces and Signal Representation vector space, such as any of the vector spaces considered above, e.g., a function x = x(t) L 2.Second,inamorenarrowsense,itcanalsorepresentatupleofN elements, an N-D vector x =[x[1],,x[n]] T C N,whereN may be infinity. It should be clear what a vector x represents from the context in our future discussion. Definition 2.2. The sum of two subspaces S 1 V and S 2 V of a vector space V is defined as S 1 + S 2 = {s 1 + s 2 s 1 S 1, s 2 S 2 } (2.3) In particular, if S 1 and S 2 are mutually exclusive: S 1 S 2 = (2.4) then their sum S 1 + S 2 is called direct sum, denoted by S 1 S 2.Moreover,if S 1 S 2 = V,thenS 1 and S 2 form a direct sum decomposition of the vector space V,andS 1 and S 2 are said to be complementary. The direct sum decomposition of V can be generalized to include multiple subspaces: V = N n=1s n = S 1 S N (2.5) where all subspaces S n V are mutually exclusive: S m S n =, (m n) (2.6) Definition 2.3. Let S 1 V and S 2 V be subsets of V and S 1 S 2 = V.Then p S1,S 2 (s 1 + s 2 )=s 1, (s 1 S 1, s 2 S 2 ) (2.7) is called the projection of s 1 + s 2 onto S 1 along S Inner Product Space Definition 2.4. An inner product on a vector space V is a function that maps two vectors x, y V to a scalar < x, y > C and satisfies the following conditions: Positive definiteness: Conjugate symmetry: < x, x >, < x, x >= iff x = (2.8) < x, y >= < y, x > (2.9) If the vector space is real, the inner product becomes symmetric: Linearity in the first variable: < x, y >=< y, x > (2.1) <ax + by, z >= a<x, z > +b <y, z > (2.11)

53 2.1 Inner Product Space 35 where a, b C. Thelinearitydoesnotapplytothesecondvariable: < x,ay + bz >= <ay + bz, x > = a<y, x > +b <z, x > = a<x, y > +b <x, z > a<x, y > +b <x, z > (2.12) unless the coefficients are real a, b R. Asaspecialcase,whenb =,wehave: <ax, y >= a<x, y >, < x,ay >= a<x, y > (2.13) More generally we have: < n c n x n, y >= n c n < x n, y >, < x, n c n y n >= n c n < x, y n > (2.14) Definition 2.5. Avectorspacewithinnerproductdefinediscalledaninner product space. In particular, when the inner product is defined, C N is called a unitary space and R N is called a Euclidean space. All vector spaces in the future discussion will be assumed to be inner product spaces. Some examples of the inner product are list below: In an N-D vector space, the inner product, also called the dot product, oftwo vectors x =[x[1],,x[n]] T and y =[y[1],,y[n]] T is defined as: y[1] < x, y >= x T y = y y[2] x =[x[1],x[2],,x[n]]. = N x[n]y[n] n=1 y[n] (2.15) where y = y T is the conjugate transpose of y. In a space of 2-D matrices XM N containing M N elements x[m, n] (m = 1,,M, n=1,,n), the inner product of two such matrices X and Y is defined as: M N < X, Y >= x[m, n]y[m, n] (2.16) m=1 n=1 This inner product is equivalent to Eq.2.15 if we cascade the column (or row) vectors of X and Y to form two MN-D vectors. In a function space, the inner product of two function vectors x = x(t) and y = y(t) isdefinedas: <x(t),y(t) >= b a x(t)y(t)dt = b a x(t)y(t)dt = < y(t),x(t) > (2.17)

54 36 Vector Spaces and Signal Representation In particular, Eq.1.1 for the sifting property of the delta function δ(t) isan inner product: <x(t),δ(t) >= x(τ)δ(τ)dτ = x() (2.18) The inner product of two random variables x and y can be defined as: <x,y>= E[xy] (2.19) If the two random variables have zero means, i.e., µ x = E(x) =andµ x = E(x) =,theinnerproductaboveisalsotheircovariance: σ 2 xy = E[(x µ x)(y µ y )] = E(xy) µ x µ y = E(xy) =<x,y> (2.2) The concept of inner product is of essential importance based on which a whole set of other important concepts can be defined. Definition 2.6. If the inner product of two vectors x and y is zero, < x, y >=, they are orthogonal (perpendicular) to each other, denoted by x y. Definition 2.7. The norm (or length) of a vector x V is defined as: x = < x, x > =< x, x > 1/2, or x 2 =< x, x > (2.21) The norm x is nonnegative and it is zero if and only if x =. Inparticular, if x =1, then it is said to be normalized and becomes a unit vector. Any vector can be normalized when divided by its own norm: x/ x. Thevector norm squared x 2 =< x, x > can be considered as the energy of the vector. Specifically in an N-D unitary space, the norm of a vector x = [x[1],,x[n]] T C N is: x = < x, x > = [ N 1/2 [ N ] 1/2 x T x = x[n]x[n]] = x[n] 2 (2.22) n=1 n=1 The total energy contained in this vector is its norm squared: E = x 2 =< x, x >= N x[n] 2 (2.23) n=1 This norm can be generalized to p-norm defined as: [ N ] 1/p x p = x[n] p Particularly x 1 = n=1 (2.24) N x[n], x = max( x[1],, x[n] ) (2.25) n=1

55 2.1 Inner Product Space 37 The norm of a matrix X can be defined differently but here we will only consider the element-wise norm defined as: [ N ] 1/p X p = x[m][n] p (2.26) n=1 When p =2, X 2 2 can be considered as the total energy contained in the 2-D signal X. Wewillalwaysusethismatrixnorminthefuture. The concept of N-D unitary (or Euclidean) space can be generalized to an infinite-dimensional space, in which case the range of the summation will cover all real integers Z in the entire real axis <n<. Thisnormexistsonlyif the summation converges to a finite value, i.e., the vector x is an energy signal with finite energy: x[n] 2 < (2.27) n= All such vectors x satisfying the above are square-summable and form the vector space denoted by l 2 (Z). Similarly, in a function space, the norm of a function vector x = x(t) isdefined as: ( b 1/2 ( b 1/2 x = x(t)x(t) dt) = x(t) dt) 2 (2.28) a where the lower and upper integral limits a<bare two real numbers, which may be extended to all real values R in the entire real axis <t<. Thisnorm exists only if the integral converges to a finite value, i.e., x(t) isanenergysignal containing finite energy: a x(t) 2 dt < (2.29) All such functions x(t) satisfying the above are square-integrable, and they form afunctionspacedenotedbyl 2 (R). All vectors and functions in the future discussion are assumed to be squaresummable/integrable, i.e., they represent energy signals containing finite amount of energy, so that these conditions do not need to be mentioned every time a signal vector is considered. Theorem 2.1. (The Cauchy-Schwarz inequality) The following inequality holds for any two vectors x, y V in an inner product space V : < x, y > 2 < x, x >< y, y >, i.e., < x, y > x y (2.3)

56 38 Vector Spaces and Signal Representation Proof: If either x or y is zero, we have < x, y >=,i.e.,eq.2.3holds(an equality). Otherwise, we consider the following inner product: < x λy, x λy >= x 2 λ<x, y > λ <y, x > + λ 2 y 2 (2.31) where λ C is an arbitrary complex number, which can be assumed to be: λ = < x, y > y 2, then λ = < y, x > y 2, λ 2 < x, y > 2 = y 4 (2.32) Substitute these into Eq.2.31, we get x 2 < x, y > 2 y 2, i.e., < x, y > x y (2.33) Definition 2.8. The angle between two vectors x and y is defined as: ( ) < x, y > θ =cos 1 x y (2.34) Now the inner product of x and y can also be written as < x, y >= x y cos θ (2.35) In particular, if θ =,thencosθ =1,andx and y are collinear, the inner product < x, y >= x y in Eq.2.3 is maximized. Else if θ = π/2, then cos θ =, and x and y are orthogonal to each other, the inner product < x, y >= is minimized. Definition 2.9. The orthogonal projection of a vector x V onto another vector y V is defined as p y (x) = < x, y > y y y = < x, y > < y, y > y = x cos θ y y where θ =cos 1 [< x, y >/( x y )] is the angle between the two vectors. The projection p y (x) isavectoranditsnormisascalardenotedby: p y (x) = p y (x) = < x, y > y (2.36) = x cos θ (2.37) which is sometimes also referred to as the scalar projection or simply projection. The projection p y (x) isillustratedinfig.2.1.inparticular,ify is a unit (normalized) vector with y =1,wehave p y (x) =< x, y > y, p y (x) =< x, y > (2.38) In other words, the magnitude of the projection of x onto a unit vector is simply their inner product. Example 2.1: Find the projection of x =[1, 2] T onto y =[3, 1] T.

57 2.1 Inner Product Space 39 Figure 2.1 Orthogonal Projection The angle between the two vectors is θ =cos 1 < x, y > ( )=cos 1 5 ( )=cos 1.77 = 45 (2.39) x, x >< y, y > 5 1 The projection of x on y is: p y (x) = < x, y > < y, y > y = 5 1 [ ] 3 = 1 [ ] (2.4) The norm of the projection is , which is of course the same as x cos θ = 5cos If y is normalized to become z = y/ y = [3, 1]/ 1, then the projection of x onto z can be simply obtained as their inner product: [ ] 3 p z (x) = p z (x) =< x, z >= [1, 2] / 1 = 5/ (2.41) 1 Definition 2.1. Two subspaces S 1 V and S 2 V of an inner product space V are orthogonal, denoted by S 1 S 2,ifs 1 s 2 for any s 1 S 1 and s 2 S 2.In particular, if one of the subsets contains only one vector S 1 = {s 1 },thenthe vector is orthogonal to the other subset s 1 S 2. Definition The orthogonal complement of a subspace S V is the set of all vectors in V that are orthogonal to S: S = {v V v S} = {v V < v, u >=, u S} (2.42) Definition An inner product space V as the direct sum of n mutually orthogonal subspaces S i V (i =1,,n)iscalledtheorthogonaldirectsumof these subspaces: V = S 1 S n, with S i S j for all i j (2.43) It can be shown that if V = S 1 S 2 and S 1 S 2,then S S =, and S S = V (2.44) Definition Let S V and S S = V and s S, r S.Thenp S (s + r) =s is called the orthogonal projection of s + r onto S.

58 4 Vector Spaces and Signal Representation All of these definitions can be intuitively and trivially visualized in a 3-D space spanned by three perpendicular coordinates (x, y, z) representingthreemutually orthogonal subspaces. The orthogonal direct sum of these subspaces is the 3-D space, and the orthogonal complement of the subspace in x direction is the 2-D y-z plane formed by coordinates y and z. The orthogonal projection of a vector v =[1, 2, 3] T onto the subspace in x direction is [1,, ] T,anditsorthogonal projection onto the y-z subspace is a 2-D vector [, 2, 3] T. Definition The distance between two vectors x, y is d(x, y) = x y (2.45) Theorem 2.2. The distance satisfies the following three conditions: Nonnegative: Symmetric: Triangle inequality: d(x, y), d(x, y) = iff x = y (2.46) d(x, y) =d(y, x) (2.47) d(x, y) d(x, z)+d(z, y) (2.48) Proof: The first two conditions are self-evident based on the definition. We now show the third condition also holds by considering the following: u + v 2 == u 2 + < u, v > + < v, u > + v 2 = u 2 +2Re < u, v > + v 2 u 2 +2 < u, v > + v 2 u 2 +2 u v + v 2 =( u + v ) 2 (2.49) The first sign above is due to the fact that the magnitude of a complex number is no less that its real part, and the second sign is simply the Cauchy-Schwarz inequality. Taking the square root on both sides, we get: u + v u + v (2.5) We further let u = x z and v = z y, theabovebecomesthetriangleinequality: This is Eq Q.E.D. x y x z + z y (2.51) Definition When distance is defined between any two vectors in a vector space, it is called a metric space.

59 2.1 Inner Product Space 41 In a unitary space C N,theEuclidean distance between any two vectors x and y can be defined as the norm of the difference vector x y: ( N ) 1/2 d(x, y) = x y = x[n] y[n] 2 (2.52) This distance can be considered as a special case (p =2) of the more general p-norm distance: ( N ) 1/p d p (x, y) = x[n] y[n] p (2.53) n=1 Other commonly used p-norm distances include: N d 1 (x, y) = x[n] y[n] (2.54) n=1 d (x, y) =max( x[1] y[1],, x[n] y[n] ) (2.55) In a function space, p-norm distance between two functions x(t) andy(t) is similarly defined as: ( b 1/p d p (x(t),y(t)) = x(t) y(t) dt) p (2.56) a In particular when p =2,wehave: ( b 1/2 d 2 (x(t),y(t)) = x(t) y(t) = x(t) y(t) dt) 2 (2.57) n=1 a Bases of Vector Space Definition In a vector space V, the subspace W of all linear combinations of a set of M vectors b k V, (k =1,,M) is called the linear span of the vectors: M W = span(b 1,, b N )={ c[k]b k c[k] C} (2.58) k=1 Definition Asetoflinearlyindependentvectorsthatspansavectorspace is called a basis of the space. The basis vectors are linearly independent, i.e., none of them can be represented as a linear combination of the rest. They are also complete, i.e., including any additional vector in the basis it would no longer be linearly independent, and removing any of them would result in inability to represent certain vectors in the space. In other words, a basis is a minimum set of vectors capable of

60 42 Vector Spaces and Signal Representation representing any vector in the space. Also as any rotation of a given basis will result in a different basis, we see that there are infinitely many bases that all span the same space. This idea is of great importance in our future discussion. For example, any vector x C N can be uniquely expressed as a linear combination of some N basis vectors b k : x = N c[k]b k (2.59) k=1 Moreover, the concept of a finite N-D space spanned by a basis composed of N discrete (countable) linearly independent vectors can be generalized to a vector space V spanned by a basis composed of a family of uncountably infinite vectors b(f). Any vector x V in the space can be expressed as a linear combination, an integral, of these basis vectors: x = b a c(f)b(f)df (2.6) We see that the index k for the summation in Eq.2.59 is replaced by a continuous variable f for the integral, and the coefficients c[k] isreplacebyacontinuous weighting function c(f) forthesetofuncountablebasisvectorsb(f) witha < f<b.thesignificanceofthisgeneralizationbecomesclearduringourfuture discussion of orthogonal transforms of continuous signals x(t). An important issue is how to find the coefficients c[k] ortheweightingfunctionc(f), given the vector x and the basis b k or b(f). Consider specifically the case of an N-D unitary space C N as an example. Let {b 1,, b M } be a basis consisting of M linearly independent N-D vectors. Then any vector x C N can be represented as a linear combination of these basis vectors: x = x[1]. x[n] N 1 = M c[k]b k =[b 1,, b M ] N M k=1 c[1]. c[m] M 1 = Bc (2.61) where B =[b 1,, b M ]isann by M matrix composed of the M N-D basis vectors as its columns, and the nth coefficient c[n] is the nth element of an M-D vector c =[c[1],,c[m]] T.Thiscoefficientvectorc can be found by solving the equation system in Eq For the solution to exist, the number of unknown coefficients must be no fewer than the number of constraining equations, i.e., M N. Ontheotherhand,astherecanbenomorethanN independent basis vectors in this N-D space, we must also have M N. Thereforetheremustbe exactly M = N vectors in a basis of an N-D space. In this case, B is an N by N square matrix with full rank (as all column vectors are independent), i.e., its inverse B 1 exists and the coefficients can be obtained by solving the system

61 2.1 Inner Product Space 43 with N unknowns and N equations: c[1] x[1] c =. c[n] =[b 1,, b N ] 1. x[n] = B 1 x (2.62) The computational complexity to solve this system of N equations and N unknowns is O(N 3 ). Similarly, we may need to find the weighting function c(f) ineq.2.6inorder to represent a vector x in terms of the basis b(f). However, solving this equation for c(f) isnotastrivialassolvingeq.2.61forc in the previous case of a vector space spanned by a finite and discrete basis. In the next subsection, this problem will be reconsidered when some additional condition is imposed on the basis c to make the problem easier to solve. Example 2.2: A2-DEuclideanR 2 space can be spanned by two basis vectors e 1 =[1, ] T and e 2 =[, 1] T,bywhichtwovectorsa 1 =[1, ] T and a 2 =[ 1, 2] T can be represented as: a 1 =1e 1 +e 2 = [ ] 1, a 2 = 1e 1 +2e 2 = [ ] 1 2 (2.63) As a 1 and a 2 are independent (as they are not co-linear), they in turn form a basis of the space. Any given vector such as [ ] [ ] [ ] 1 1 x = =1e 1 +2e 2 =1 +2 (2.64) 2 1 can be expressed in terms of {a 1, a 2 } as [ ] [ 1 1 x = = c[1]a 1 + c[2]a 2 = c[1] 2 ] + c[2] [ ] 1 = 2 [ ][ ] c[1] c[2] (2.65) Solving this we get c[1] = 2 and c[2] = 1, so that x can be expressed by a 1 and a 2 as: [ ] [ ] [ ] x = c[1]a 1 + c[2]a 2 =2 +1 = (2.66) 2 2 This example is illustrated in Fig.2.2. Example 2.3: The previous example in R 2 can also be extended to a function space defined over [, 2] spanned by two basis functions: { { 1 t<1 1 t<1 a 1 (t) = 1 t<2, a 2(t) = (2.67) 2 1 t<2

62 44 Vector Spaces and Signal Representation Figure 2.2 Different basis vectors of a 2-D space Figure 2.3 Representation of a time function by basis functions Agiventimefunctionx(t) inthespace x(t) = { 1 t<1 2 1 t<2 can be represented by the two basis functions as: (2.68) x(t) =c[1]a 1 (t)+c[2]a 2 (t) (2.69) To obtain the coefficients c[1] and c[2], we first find the inner products of this equation with the following two functions: { { 1 t<1 t<1 e 1 (t) = 1 t<2, e 2(t) = (2.7) 1 1 t<2 to get: <x(t),e 1 (t) >= 1=c[1] <a 1 (t),e 1 (t) > +c[2] <a 2 (t),e 1 (t) >= c[1] c[2] <x(t),e 2 (t) >= 2=c[1] <a 1 (t),e 2 (t) > +c[2] <a 2 (t),e 2 (t) >= 2c[2] (2.71) Solving this equation system, which is identical to that in the previous example, we get the same coefficients c[1] = 2 and c[2] = 1. Now x(t) canbeexpressedas x(t) =2a 1 (t)+a 2 (t), as illustrated in Fig.2.3.

63 2.1 Inner Product Space 45 So far we have only considered inner product spaces of finite dimensions. Additional theory is needed to deal with spaces of infinite dimensions. Definition In a metric space V,asequence{x1, x 2, } is a Cauchy sequence if for any ɛ>, thereexistsann> such that for any m, n > N, d(x m, x n ) <ɛ. AmetricspaceV is complete if every Cauchy sequence {xn } in V converges to a x V : lim d(x m, x) = lim x x m = (2.72) m m In other words, for any ɛ>, thereexistsann> such that d(x m, x) <ɛ if m>n (2.73) AcompleteinnerproductspaceisaHilbertspace,denotedbyH. Let bk be a set of orthogonal vectors (k =1, 2, )inh,andanarbitrary vector x is approximated in an M-D subspace by ˆx M = M c[k]b k (2.74) k=1 If the least squares error of this approximation x ˆx M 2 converges to zero when M,i.e., lim x ˆx M 2 = lim M M M x 2 c[k]b k = (2.75) then this set of orthogonal vectors is said to be complete, called a complete orthogonal system, and the approximation converges to the given vector: lim M k=1 M c[k]b k = k=1 c[k]b k = x (2.76) In the following, to keep the discussion generic, the lower and upper limits of a summation or an integral may not be always explicitly specified, as the summation or integral may be finite (e.g., from 1 to N) orinfinite(e.g.,from or to ), depending on each specific case. k= Signal Representation by Orthogonal Bases As shown in Eqs.2.59 and 2.6, a vector x V in a vector space can be represented as a linear combination of a set of linearly independent basis vectors, either countable like b k,oruncountablelikeb(f), that span the space V.However, it may not be always easy to find the weighting coefficients c[k] orfunction c(f). As shown in Eq.2.62 for the simple case of the finite dimensional space C N,inordertoobtainthecoefficientvectorc, weneedtofindtheinverseof

64 46 Vector Spaces and Signal Representation the N N matrix B =[b 1,, b N ], which may not be a trivial problem if N is large. Moreover, in the case of uncountable basis b(f) ofeq.2.6,itiscertainly not a trivial problem to find the coefficient function c(f). However, as to be shown below, finding the coefficients c[k] or weighting function c(f) can become most straight forward if the basis is orthogonal. Theorem 2.3. Let x and y be any two vectors in a Hilbert space H spanned by acompleteorthonormalsystem{u k } satisfying: Then we have: 1. Series expansion: = δ[k l] (2.77) x = k < x, u k > u k (2.78) 2. Plancherel theorem: < x, y >= k < x, u k > < y, u k > (2.79) 3. Parseval s theorem: < x, x >= x 2 = k < x, u k > 2 (2.8) Here the dimensionality of the space is not specified to keep the discussion more general. Proof: As {u k } is the basis of H, anyx H can be written as: x = k c[k]u k (2.81) Taking an inner product with u l on both sides we get < x, u l >=< k c[k]u k, u l >= k c[k] = k c[k]δ[k l] =c[l] We therefore have c[k] =< x, u k > and (2.82) x = k c[k]u k = k < x, u k > u k (2.83) Here x is expressed as the vector sum of its projections p uk (x) =< x, u k > u k onto each of the unit basis vectors u k (Eq.2.38), and the scalar coefficient c[k] =< x, u k > is the norm of the projection. Vector y H can also be written as: y = l d[l]u l = l < y, u l > u l (2.84)

65 2.1 Inner Product Space 47 and we have: < x, y > = < k c[k]u k, l d[l]u l >= k c[k] l d[l] = k c[k] l d[l]δ[k l] = k c[k]d[k] = k < x, u k > < y, u k > =< c, d > (2.85) where c =[,c[k], ] T and d =[,d[k], ] T are the coefficient vectors of either finite or infinite dimensions. This is the Plancherel theorem. In particular, when x = y, wehave: < x, x >= x 2 = k < x, u k > 2 = k c[k] 2 =< c, c >= c 2 (2.86) This is Parseval s theorem or identity. Q.E.D. Eqs.2.82 and 2.83 can be combined to form a pair of equations: x = k c[k]u k = k < x, u k > u k (2.87) (2.88) c[k] =< x, u k >, for all k (2.89) The first equation is the generalized Fourier expansion, which represents a given vector x as a linear combination of the basis {u k },andtheweightingcoefficient c[k] giveninthesecondequationisthegeneralized Fourier coefficient. The results above can be generalized to a vector space spanned by a basis composed of a continuum of uncountable orthogonal basis vectors u(f) satisfying: < u(f), u(f ) >= δ(f f ) (2.9) Under this basis, any vector x in the space can be expressed as: x = c(f)u(f)df (2.91) Same as Eq.2.6, this equation also represents a given vector x in the space as alinearcombination(anintegral)ofthebasisfunctionu(f), weighted by c(f). However, different from the case in Eq.2.6, here the weighting function c(f) can be easily obtained due to the orthogonality of the basis u(f). Taking the inner product with u(f )onbothsidesofeq.2.91,weget: < x, u(f ) > = < c(f)u(f)df, u(f ) >= c(f) < u(f), u(f ) >df = c(f)δ(f f )df = c(f ) (2.92) We therefore have c(f) =< x, u(f) > (2.93)

66 48 Vector Spaces and Signal Representation representing the projection of x onto the unit basis vector u(f). Now Eq.2.91 can also be written as: x = c(f)u(f)df = < x, u(f) > u(f)df (2.94) Also, based on Eq.2.91, we can easily show that Parseval s identity holds: x 2 =< x, x >= c(f)c(f)df =< c(f),c(f) >= c(f) 2 (2.95) As a specific example, space C N can be spanned by N orthonormal vectors {u 1,, u N },wherethekthbasisvectorisu k =[u[1,k],,u[n,k]] T,that satisfy: = u T k u l = N u[n, k]u[n, l] =δ[k l] (2.96) n=1 Any vector x C N can be expressed as: c[1] N x = c[k]u k =[u 1,, u N ]. = Uc (2.97) k=1 c[n] where c =[c[1],,c[n]] T and u[1, 1] u[1, N] U =[u 1,, u N ]=..... (2.98) u[n,1] u[n,n] As the column (and row) vectors in U are orthogonal, it is a unitary matrix that satisfies U 1 = U,i.e.,UU = U U = I (Eq.??). To find the coefficient vector c, wepre-multiplyu 1 = U on both sides Eq.2.97 and get: U x = U 1 x = U 1 Uc = c (2.99) Eqs and 2.99 can be rewritten as a pair of transforms: { c = U x = U 1 x x = Uc We see that the norm of x is conserved (Parseval s identity): (2.1) x 2 =< x, x >=< Uc, Uc >= (Uc) Uc = c U Uc = c c =< c, c >= c 2 (2.11) Equivalently, the coefficient c[k] canalsobefoundbyaninnerproductwith u l on both sides of Eq.2.97: N N N < x, u l >=< c[k]u k, u l >= c[k] = c[k]δ[k l] =c[l] k=1 k=1 k=1 (2.12)

67 2.1 Inner Product Space 49 Now the transform pair above can also be written as: c[k] =< x, u k >= x = N c[k]u k = k=1 N x[n]u[n, k], (k =1,,N) (2.13) n=1 N < x, u k > u k (2.14) k=1 The second equation can also be written in component form as x[n] = N c[k]u[k, n], (n =1,,N) (2.15) k=1 Obviously the N coefficients c[k] (k =1,,N)canbeobtainedwithcomputational complexity O(N 2 ), in comparison with the complexity O(N 3 )neededto find U 1 in Eq.2.62 when non-orthogonal basis b k is used. Consider another example of L 2 space composed of all square integrable functions defined over a<t<b,spannedbyasetoforthonormalbasisfunctions φ k (t) satisfying: <φ k (t),φ l (t) >= b Any x(t) in the space can be written as a φ k (t)φ l (t)dt = δ[k l] (2.16) x(t) = k c[k]φ k (t) (2.17) Taking an inner product with φ l (t) onbothsides,weget: <x(t),φ l (t) >= k c[k] <φ k (t),φ l (t) >= k c[k]δ[k l] =c[l] (2.18) i.e., c[k] =<x(t),φ k (t) >= b a x(t)φ k (t)dt (2.19) which is the projection of x(t) ontotheunitbasisfunctionφ k (t). Again we can easily get: x(t) 2 =<x(t),x(t) >= b a x(t)x(t)dt = k c[k] 2 = c 2 (2.11) Since orthogonal bases are more advantageous than non-orthogonal ones, it is often desirable to covert a given non-orthogonal basis {a 1,, a N } into an orthogonal one {u 1,, u N } by the following Gram-Schmidt orthogonalization process: u1 = a 1 u2 = a 2 P u1 a 2 u3 = a 3 P u1 a 3 P u2 a 3

68 5 Vector Spaces and Signal Representation Figure 2.4 Gram-Schmidt orthogonalization un = a N N 1 n=1 P u n a N Example 2.4: In Example 2.2, a vector x =[1, 2] T in a 2-D space is represented under a basis composed of a 1 =[1, ] T and a 2 =[ 1, 2] T.Nowweshowthat based on this basis an orthogonal basis can be constructed by the Gram-Schmidt orthogonalization process. In this case of n =2, we have u 1 = a 1 =[1, ] T, P u1 a 2 =[ 1, ] T,and [ ] [ ] [ ] 1 1 u 2 = a 2 P u1 a 2 = = (2.111) 2 2 We see that the new basis {u 1, u 2 } is indeed orthogonal as =.Now the same vector x =[1, 2] T can be represented by the new orthogonal basis as: [ ] [ ] [ ] 1 1 x = =1u 1 +1u 2 = + (2.112) 2 2 In this particular case, both coefficients c[1] = c[2] = 1 happen to be 1, as illustrated in Fig Signal Representation by Standard Bases Here we consider, as a special case of the orthogonal bases, the standard basis in the N-D space R N.WhenN =3, a vector v =[x, y, z] T is conventionally represented as: x 1 v = y = xi + yj + zk = x + y 1 + z (2.113) z 1 where i =[1,, ] T, j =[, 1, ] T,andk =[,, 1] T are the three standard (or canonical) basis vectors along each of the three mutually perpendicular axes. This standard basis {i, j, k} in R 3 can be generalized to R N spanned by a set of

69 2.1 Inner Product Space 51 N standard basis vectors defined as: 1 e 1 =., e 1 2 =.,, e N =.. 1 (2.114) All components of the nth standard basis vector e n are zero except the nth one which is 1, i.e., the mth component of the nth vector e n is e[m, n] =δ[m n]. These standard basis vectors are indeed orthogonal as < e m, e n >= δ[m n] (m, n =1,,N), and they form an identity matrix I =[e 1,, e N ], which is aspecialunitarymatrixsatisfyingi = I 1 = I T = I. Given this standard basis in R N,avectorx =[x[1],,x[n]] T representing N samples of a time signal can be expressed as a linear combination of the N standard basis vectors: N x = x[n]e n =[e 1,, e N ]x = Ix (2.115) n=1 and the mth component x[m] ofx is: x[m] = N x[n]e[m, n] = n=1 N x[n]δ[m n], (m =1,,N) (2.116) n=1 Comparing this equation with Eq.1.3 in the previous chapter we see that they are actually in exactly the same form (except here the signal x has a finite number of N samples), indicating the fact that whenever a discrete time signal is given in the form of a vector x =[x[1],,x[n]] T,itisrepresentedimplicitlybythe standard basis, i.e., the signal is decomposed in time in terms of a set of components x[m] eachcorrespondingtoaparticulartimesegmentδ[m n] atn = m. However, while it may seem only natural and reasonable to decompose a signal into a set of time samples, or equivalently, to represent the signal vector by the standard basis, it is also possible, and sometime more beneficial, to decompose the signal into a set of components along some dimension other than time, or equivalently, to represent the signal vector by an orthogonal basis which can be obtained by rotating the standard basis. This is an important point which is to be emphasized through out the book. The concept of representing adiscretetimesignalx[n] bythestandardbasis can be extended to the representation of a continuous time signal x(t) ( <t< T ). We first recall the unit square impulse function defined in Eq.1.4: { 1/ t< δ (t) = (2.117) else based on which a set of basis functions e n (t) =δ (t n ) (n =,,N 1) can be obtained by a translation of n in time. These basis functions are obvi-

70 52 Vector Spaces and Signal Representation ously orthonormal: <e m (t),e n (t) >= T δ (t m ) δ (t n )dt = δ[m n] (2.118) Next, we sample the continuous time signal x(t) with a sampling interval = T/N to get a set of discrete samples {x[],,x[n 1]}, andapproximatethe signal as: x(t) x(t) = N 1 n= x[n]e n (t) = N 1 n= x[n]δ (t n ) (2.119) Here x[n]e n (t) representsthenthsegmentofthesignaloverthetimeduration n <t<(n +1), asillustratedinfig.2.5.weseethateachofthesefunctions e n (t) =δ (t n )representsacertaintimesegment,sameasthestandardbasis e[m, n] =δ[m n] inc N.Note,however,theseδ (t n ) donotformabasis that spans the function space L 2,astheyarenotcomplete,inthesensethatthey can only approximate but not precisely represent a continuous function x(t) L 2.Thisshortcomingcanbeovercomeifwecontinuouslyreducethesampling interval to get the Dirac delta at the limit : lim δ (t) =δ(t) (2.12) Now the summation in Eq above becomes an integral, by which the function x(t) canbepreciselyrepresented: lim x(t) = x(τ)δ(t τ)dτ = x(t) (2.121) This equation is actually the same as Eq. 1.9 in the previous chapter. Now we have obtained a continuum of uncountable basis functions e τ (t) =δ(t τ) (for all τ), which are complete as well as orthonormal, i.e., they form a standard basis of the function space L 2,bywhichanycontinuoussignalx(t) canberepresented, just as the standard basis e n in C N by which any discrete signal x[n] can represented. Again it may seem only natural to represent a continuous time signal x(t) by the corresponding standard basis representing a sequence of time impulses x(τ)δ(t τ). However, this is not the only way or the best way to represent the signal. The time signal can also be represented by a basis other than the standard basis δ(t τ), so that the signal is decomposed along some different dimension other than time. Such an alternative way of signal decomposition and representation may be desirable, as the signal can be more conveniently processed and analyzed, for whatever purpose of the signal processing task. This is actually the fundamental reason why different orthogonal transforms are developed, as to be discussed in details in future chapters. Fig.2.6 illustrates the idea that any given vector x can be equivalently represented under different bases each corresponding to a different set of coefficients,

71 2.1 Inner Product Space 53 Figure 2.5 Vector representation of an N-D space (N=3) Figure 2.6 Representations of the same vector x under different bases standard basis e k (left), an unitary (orthogonal) basis u k (middle), and a non-orthogonal basis b k (right) such as the standard basis, an orthogonal basis (any rotated version of the standard basis), or an arbitrary basis not necessarily orthogonal at all. While nonorthogonal axes are never actually used, one always has many options in terms of what orthogonal basis to use An Example: the Fourier Transforms To illustrate how a vector can be represented by an orthogonal basis that spans the space, we consider the following four Fourier bases that span four different types of vector spaces for signals that are either continuous or discrete, of finite or infinite duration. uk =[e j2πk/n,,e j2πk(n 1)/N ] T / N (k =,,N 1) form a set of N orthonormal basis vectors that span C N (Eq.1.4): = 1 N N 1 n= e j2π(k l)n/n = δ[k l] (2.122)

72 54 Vector Spaces and Signal Representation Any vector x =[x[],,x[n 1]] T in C N can be expressed as: x = or in component form: N 1 k= X[k]u k = N 1 k= < x, u k > u k (2.123) x[n] = 1 N 1 X[k]e j2πkn/n, ( n N 1) (2.124) N k= where the coefficient X[k] istheprojectionofx onto u k : X[k] =< x, u k >= N 1 n= x[n]u[n, k] = 1 N 1 x[n]e j2πnk/n (2.125) N u(f) =[,e j2πmf/f, ] T / F ( <f<f)formasetofuncountablyinfinite orthonormal basis vectors (of infinite dimensions) (Eq.1.35) that spans l 2 space of all square summable vectors of infinite dimensions: = 1 e j2π(f f )m/f = δ(f f ) (2.126) F m= n= Any vector x =[,x[n], ] T in this space can be expressed as: x = or in component form: x[n] = 1 F X(f)u(f)df = < x, u(f) > u(f)df (2.127) X(f)e j2πfn/f df, ( <n< ) (2.128) where the coefficient function X(f) istheprojectionofx onto u(f): X(f) =< x, u(f) >= 1 F n= x[n]e j2πfn/f (2.129) uk (t) =e j2πkt/t / T ( <k< ) formasetofinfiniteorthonormalbasis functions (Eq.1.33) that spans the space of all square integrable functions defined over <t<t: = 1 T T e j2π(k l)t/t dt = δ[k l] (2.13) Any function x T (t) inthisspacecanbeexpressedas: x T (t) = X[k]u k (t) = 1 X[k]e j2πkt/t (2.131) T k= k=

73 2.2 Unitary Transformation and Signal Representation 55 where the coefficient X[k] istheprojectionofx(t) ontothekthbasisfunction u k (t): X[k] =<x(t),u k (t) >= x(t)u k (t)df = 1 T x(t)e j2πkt/t dt (2.132) uf (t) =e j2πft ( <f< ) isasetofuncountablyinfiniteorthonormal basis functions (Eq.1.28) that spans L 2 space of all square integrable functions defined over <t<. = Any function x(t) in this space can be expressed as: x(t) = X(f)u f (t)df = e j2π(f f )t dt = δ(f f ) (2.133) X(f)e j2πft df (2.134) where the coefficient function is the projection of x(t) ontou f (t): X(f) =<x(t),u f (t) >= x(t)u f (t)df = x(t)e j2πft dt (2.135) 2.2 Unitary Transformation and Signal Representation Linear Transformation Definition Let V and W be two vector spaces. A transformation is afunctionormappingt : V W that converts a vector x V to another vector u W denoted by: T x = u. IfW = V,thelineartransformationT is alinearoperator. If the transformation is invertible, i.e., then a transformation that converts u W back to x V is an inverse transformation denoted by: x = T 1 u. An identity transformation maps a vector to itself: Ix = x. Obviously TT 1 = T 1 T = I is an identity operator that maps a vector to itself: TT 1 u = T (T 1 u)=t x = u = Iu T 1 T x = T 1 (T x) =T 1 u = x = Ix (2.136) AtransformationT is linear if the following is true: for any scalars a, b C and any vectors x, y V. T (ax + by) =at x + bt y (2.137) For example, the derivative and integral of a continuous function x(t) arelinear operators: T d x(t) = d dt x(t) =ẋ(t), T ix(t) = x(τ)dτ (2.138)

74 56 Vector Spaces and Signal Representation For another example, an M by N matrix A with its mn-th element being a[m, n] C is a linear transformation T A : C N C M that maps an N-D vector x C N to an M-D vector y C M : or in component form: y[1] a[1, 1] a[1, 2] a[1, N] y[2] a[2, 1] a[2, 2] a[2, N] = y[m] a[m,1] a[m,2] a[m,n] M 1 T A x = Ax = y (2.139) M N x[1] x[2]. x[n] N 1 (2.14) If M = N, thenx, y C N and A becomes a linear operator. However, note that the operation of translation T t x = x + t is not a linear transformation: T t (ax + by) =ax + by + t at t x + bt t y = ax + by +(a + b)t (2.141) Definition 2.2. For a linear transformation T : V W,ifthereisanother transformation T : W V so that <Tx, u >=< x,t u > (2.142) for any x V and u W,theT is called the Hermitian adjoint or simply adjoint of T. If a linear operator T : V V is its own adjoint, i.e., <Tx, y >=< x,ty > (2.143) for any x, y V,thenT is called a self-adjoint or Hermitian transformation. In the following, the terms self-adjoint and Hermitian are used interchangeably. In particular, in the unitary space C N,letB = A be the adjoint of matrix A, i.e.,< Ax, y >=< x, By >, thenwehave: < Ax, y >= (Ax) T y = x T A T y =< x, By >= x T By (2.144) Comparing the two sides we get A T = B, i.e.,theadjointmatrixb = A = A T is the conjugate transpose of A: A = A T (2.145) AmatrixA is self-adjoint, orhermitian if A = A = A T,i.e., < Ax, y >=< x, Ay > (2.146) In particular, when A = A is real, a self-adjoint matrix A = A = A T is symmetric. Note that we have always used A to denote the conjugate transpose of

75 2.2 Unitary Transformation and Signal Representation 57 amatrixa, whichwenowseeisalsotheself-adjointofa, andthenotationt is more generally used to denote the self-adjoint of any operator T. In a function space, if T is the adjoint of a linear operator T,thenthefollowing holds: <Tx(t),y(t) >= Tx(t) y(t)dt =< x(t),t y(t) >= x(t) T y(t)dt (2.147) If T = T,itisaself-adjointorHermitianoperator Eigenvalue problems Definition If the application of an operator T to a vector x V results in another vector λx V,whereλ C is a constant scalar: T x = λx (2.148) then the scalar λ is an eigenvalue of T and vector x is the corresponding eigenvector or eigenfunctions of T,andtheequationaboveiscalledtheeigenequation of the operator T.Thesetofalleigenvaluesofanoperatoriscalledthespectrum of the operator. In a unitary space C N,anN by N matrix A is a linear operator and the associated eigenequation is: Aφ n = λ n φ n, n =1,,N (2.149) where λ n and φ n are the nth eigenvalue and the corresponding eigenvector of A, respectively. In a function space, the nth-order differential operator D n = d n /dt n is a linear operator with the following eigenequation: D n φ(t) =D n e st = dn dt n est = s n e st = λφ(t) (2.15) where s is a complex scalar. Here the λ = s n is the eigenvalue and the complex exponential φ(t) =e st is the corresponding eigenfunction. More generally, we can write an Nth order linear constant coefficient differential equation (LCCDE) as: N n= [ d n N ] a n dt n y(t) = a n D n y(t) =x(t) (2.151) n= where N n= a nd n is a linear operator that is applied to function y(t), representing the response of a linear system to an input x(t). Obviously the same complex exponential φ(t) =e st is also the eigenfunction corresponding to the eigenvalue λ = n k= a ks k of this operator. Perhaps the most well known eigenvalue problem in physics is the Schrodinger equation, which describes a particle in terms of its energy and the De Broglie

76 58 Vector Spaces and Signal Representation wave function. Specifically for a 1-D stationary single particle system, we have: Ĥψ(x) = [ h2 2 ] 2m x 2 + V (x) ψ(x) =Eψ(x) (2.152) where 2 Ĥ = h2 + V (x) (2.153) 2m x2 is the Hamiltonian operator, h is the Planck constant, m and V (x) are the mass and potential energy of the particle, respectively. E is the eigenvalue of Ĥ, representingthetotalenergyoftheparticle,andthewavefunctionψ(x) is the corresponding eigenfunction, also called eigenstate, representing probability amplitude of the particle, i.e., ψ(x) 2 is the probability for the particle to be found at position x. Theorem 2.4. Aself-adjointoperatorhasthefollowingproperties: 1. All eigenvalues are real; 2. The eigenvectors corresponding to different eigenvalues are orthogonal; 3. The family of all eigenvectors forms a complete orthogonal system. Proof: Let λ and µ be two different eigenvalues of a self-adjoint operator T, and x and y be the corresponding eigenvectors: As T = T is self-adjoint, we have: T x = λx, Ty = µy (2.154) < T x, y >=< x, T y > (2.155) Substituting T x = λx into Eq and letting y = x, weget <λx, x >=< x,λx >, i.e. λ<x, x >= λ<x, x > (2.156) As in general < x, x >,weseethatλ = λ is real. Next, we substitute T x = λx and T y = µy into Eq and get: λ<x, y >= µ<x, y >= µ<x, y > (2.157) As in general λ µ, weget< x, y >=,i.e.,x and y are orthogonal. The proof of the third property is beyond the scope of the book and is therefore omitted. Q.E.D. For example, the Hamiltonian operator Ĥ in the Schrodinger equation is a self-adjoint operator with real eigenvalues E representing different energy levels corresponding to different eigenstates of the particle. The third property in Theorem 2.4 indicates that the eigenvectors of a selfadjoint operator can be used as an orthogonal basis of a vector space, so that any vector in the space can be represented as a linear combination of these eigenvectors.

77 2.2 Unitary Transformation and Signal Representation 59 In space C N,letλ k and φ k (k =1,,N)betheeigenvaluesandthecorresponding eigenvectors of a Hermitian matrix A = A,thenitseigenequation can be written as: Aφ k = λ k φ k, k =1,,N, (2.158) We can further combine all N eigenequations to have: A[φ 1,, φ N ]=[φ 1,, φ N ]Λ, or AΦ = ΦΛ (2.159) where matrices Φ and Λ are defined as: λ 1 λ 2 Φ =[φ 1,, φ N ], Λ = λ N (2.16) As A is a self-adjoint operator, its eigenvalues λ k are real, and their corresponding eigenvectors φ k are orthogonal: < φ k, φ l >= φ T k φ l = δ[k l] (2.161) and they form a complete orthogonal system to span the N-D unitary space. Also Φ is a unitary matrix satisfying: Φ Φ = I, or Φ = Φ 1 (2.162) The eigenequation in Eq can also be written in some other useful forms. First, pre-multiplying both sides of the equation by Φ 1 = Φ,weget: Φ 1 AΦ = Φ AΦ = Λ (2.163) i.e., the matrix A can be diagonalized by Φ. Alternatively,ifwepost-multiply both sides of Eq by Φ,weget: λ 1 φ 1 A = ΦΛΦ λ 2 φ 2 =[φ 1, φ 2,, φ N ] = N λ k φ k φ k (2.164) k=1 λ N i.e., the matrix A can be series expanded to become a linear combination of N eigen-matrices φ k φ k (k =1,,N). φ N Eigenvectors of D 2 as Fourier Basis Here we consider a particular example of the self-adjoint operators, the secondorder differential operator D 2 = d 2 /dt 2 in L 2 -space, which is of important significance as its orthogonal eigenfunctions form the basis used in the Fourier transform.

78 6 Vector Spaces and Signal Representation First we show that D 2 is indeed a self-adjoint operator: <D 2 x(t),y(t) >=< x(t),d 2 y(t) > (2.165) where x(t) andy(t) aretwofunctionsdefinedoveracertaintimeintervalsuch as [,T], and D 2 x(t) =ẍ(t) isthesecondtimederivativeoffunctionx(t). Using integration by parts, we can show that this equation does hold: T T T <D 2 x(t),y(t) > = ẍ(t)y(t)dt =ẋ(t)y(t) ẋ(t)ẏ(t)dt T T T =ẋ(t)y(t) x(t)ẏ(t) + x(t)ÿ(t)dt =< x(t),d 2 y(t) > (2.166) Here we have assumed all functions satisfy x() = x(t ), ẋ() = ẋ(t ), so that [ẋ(t)y(t) ] T x(t)ẏ(t) = (2.167) Next, we find the eigenvalues and eigenfunctions of D 2 by solving this equation: { D 2 φ(t) =λφ(t), i.e. φ(t) λφ(t) = subject to: φ() = φ(t ), φ() = φ(t ) (2.168) Consider the following three cases: 1. λ =: The equation becomes φ(t) = with solution φ(t) =c 1 t + c 2.Substituting this φ(t) into the boundary conditions,we have: φ() = c 2 = φ(t )=c 1 T + c 2 (2.169) We get c 1 =andtheeigenfunctionφ(t) =c 2 is any constant. 2. λ>: We assume φ(t) =e st and substitute it into the equation to get (s 2 λ)e st =, i.e. s = ± λ (2.17) The solution is φ(t) =ce ± λt.substitutingthisintotheboundaryconditions, we have: φ() = c = φ(t )=ce ± λt (2.171) Obviously this equation holds only if λ =, as in the previous case. 3. λ<: We assume λ = ω 2,i.e., λ = ±jω, andthesolutionis φ(t) =ce ± λt = ce ±jωt (2.172) Substituting this into the boundary conditions we have: φ() = c = φ(t )=ce ±jωt, i.e. e ±jωt =1 (2.173)

79 2.2 Unitary Transformation and Signal Representation 61 which can be solved to get: ωt =2kπ, i.e. ω = 2kπ T =2kπf = nω, (k =, ±1, ±2, ) (2.174) where we have defined f = 1 T, ω =2πf = 2π T, (2.175) Now the eigenvalues and the corresponding eigenfunctions can be written as: λ k = ω 2 = (kω ) 2 = (2kπf ) 2 = (2kπ/T) 2 (2.176) φ k (t) =ce ±jkω = ce ±j2kπf = ce ±j2kπ/t (2.177) (k =, ±1, ±2, ) In particular, when k =, we have λ k = and φ (t) =c, same as the first case above. These eigenvalues and their corresponding eigenfunctions have the following properties: The eigenvalues are discrete, the gap between two consecutive eigenvalues is: λ k = λ k+1 λ k (2.178) All eigenfunctions are also discrete with a frequency gap between two consecutive eigenfunctions: All eigenfunctions φk (t) areperiodicwithperiodt : ω =2πf =2π/T (2.179) φ k (t + T )=e j2kπ(t+t )/T = e j2kπt/t e j2kπ = e j2kπt/t = φ k (t) (2.18) According to the properties of self-adjoint operators discussed above, the eigenfunctions φ k (t) ofd 2 form a complete orthogonal system. The orthogonality can be easily verified: T T <φ k (t),φ l (t) >= c 2 e jkωt e jlωt dt = c 2 e j2π(k l)t/t dt T = c 2 cos( 2π(k l)t T )dt + jc 2 sin( T If we let c =1/ T,thentheeigenfunctionsbecome 2π(k l)t )dt = T φ k (t) = 1 T e j2kπt/t = 1 T e j2kπf t { T k = l k l (2.181) (2.182) which are orthonormal: <φ k (t),φ l (t) >= 1 T T e j2π(k l)t/t dt = δ[k l] (2.183)

80 62 Vector Spaces and Signal Representation This is actually Eq As a complete orthogonal system, these orthogonal eigenfunctions form a basis to span the function space over [,T], i.e., all periodic functions x T (t) =x T (t + T )canberepresentedasalinearcombinationofthese basis functions: x T (t) = X[k]φ k (t) = X[k]e j2kπft = X[k]e jkω (2.184) k= k= k= where X[k] (k =, ±1, ±2, )arethecoefficientsgivenineq thisisthe Fourier expansion, to be discussed in detail in the next chapter. The expansion of a non-periodic function can be similary obtained if we let T so that at the limit a periodic function x T (t) becomesnon-periodic,and the following will take place: The discrete variables kω =2kπ/T (k =, ±1, ±2, )becomesacontinuous variable <ω< ; The gap between two consecutive eigenvalues becomes zero, i.e., λk, the discrete eigenvalues λ k = (2kπ/T) 2 become a continuous eigenvalue function λ = ω 2 ; The frequency gap ω between two consecutive eigenfunctions becomes zero, the discrete eigenfunctions φ k (t) =e j2kπt/t (k =, ±1, ±2, )becomeaset of uncountable non-periodic eigenfunctions φ f (t) =e j2πft for all <f<. We see that the same self-adjoint operator D 2 is now defined over a different interval (, ) andcorrespondinglyitseigenfunctionsφ(t) =e jωt = e j2πft = φ(t, f) becomeacontinuousfunctionoff as well as t and they form a complete orthogonal system spanning the function space of all non-periodic functions: <φ f (t),φ f (t) >= e j2π(f f )t dt = δ(f f ) (2.185) This is actually Eq Now φ f (t) becomesasetofuncountablyinfinitebasis functions and any non-periodic square integrable function x(t)canberepresented as: x(t) = X(f)φ f (t)df = X(f)e j2πft df (2.186) where X(f) istheweightingfunctiongivenineq thisisthefourier transform, to be discussed in detail in the next chapter Unitary Transformations Definition AlineartransformationU : V W is a unitary transformation if it conserves inner products: < x, y >=< Ux,Uy > (2.187)

81 2.2 Unitary Transformation and Signal Representation 63 In particularly, if the vectors are real with symmetric inner product < x, y >=< y, x >, then U is an orthogonal transformation. Obviously a unitary transformation also conserves any measurement based on the inner product, such as the norm of a vector, the distance and angle between two vectors, and the projection of one vector on another. Also, if in particular x = y, wehave < x, x >= x 2 =<Ux,Ux >= Ux 2 (2.188) i.e., the unitary transformation conserves vector norm (length). This is Parseval s identity for a generic unitary transformation U x. Due to this property,a unitary operation R : V V can be intuitively interpreted as a rotation in space V. 1 Theorem 2.5. A linear transformation U is unitary if and only if its adjoint U is equal to its inverse U 1 : U = U 1, i.e. U U = UU = I (2.189) Proof: We let Uy = d, i.e.,y = U 1 d in Eq.2.187, and get <Ux, d >=< x,u 1 d >=< x,u d > (2.19) i.e., U 1 = U.Q.E.D. Due to this theorem, Eq can be used as an alternative definition for the unitary operator. In the generalized Fourier expansion in Eqs.2.88 and 2.89 based on the Plancherel Theorem (Thm.2.3), the coefficient vector c =[,c[k], ] T composed of c[k] =< x, u k > can be considered as a transformation c = Ux, andwe can also have another transformation d = Uy. Nowwehave(Eq.2.85): < x, y >=< c, d >=< Ux,Uy > (2.191) indicating that the inner product is conserved by U, i.e.,thegeneralizedfourier expansion c = Ux is actually a unitary transformation. When a unitary operator U is applied to an orthonormal basis {u k },thebasis is rotated to become another orthonormal basis {v k = Uu k } that spans the same space: < v k, v l >=< Uu k,uu l >== δ[k l] (2.192) Specially, when a unitary operator U is applied to the standard basis {e k },this basis is rotated to become a unitary basis {u k = Ue k }. 1 Strictly speaking, a unitary transformation may also correspond to other norm-preserving operations such as reflection and inversion, which could be all treated as rotations in the most general sense.

82 64 Vector Spaces and Signal Representation Unitary Transformations in N-D Space We consider specifically the unitary transformation in the N-D unitary space C N. Definition A matrix U is unitary if it conserves inner products: < Ux, Uy >=< x, y > (2.193) Theorem 2.6. AmatrixU is unitary if and only if U U = I, i.e.,thefollowing two statements are equivalent: (a) < Ux, Uy >=< x, y > (2.194) (b) U U = UU = I, i.e., U 1 = U (2.195) Proof: We first show if (b) then (a): < Ux, Uy >= (Ux) T Uy = x T U T Uy = x T Iy =< x, y > (2.196) Next we show if (a) then (b). (a) can be written as: i.e., (Ux) Ux = x U Ux = x x (2.197) x (U U I)x = (2.198) Since in general x,wemusthaveu U = I. Post-multiplyingthisequation by U 1,wegetU = U 1.Pre-multiplyingthisnewequationbyU, weget UU = I. Q.E.D. As (a) and (b) in the theorem above are equivalent, either of them can be used as the definition of a unitary matrix. If a unitary matrix U = U is real, i.e., U 1 = U T,thenitiscalledanorthogonal matrix. AunitarymatrixU has the following properties: Unitary transformation Ux conserves vector norm, i.e., Ux = x for any x C N ; All eigenvalues {λ1,,λ N } of U have an absolute value of 1: λ k =1,i.e., they lie on the unit circle in the complex plain. The determinant of U has an absolute value of 1: det(u) =1.Thiscanbe easily seen as det(u )= N k=1 λ k. All column (or row) vectors of U =[u1,, u N ]areorthonormal: = δ[k l] (2.199) The last property indicates that the column (row) vectors {u k } form an orthogonal basis that spans C N.Anyvectorx =[x[1],,x[n]] T C N represented by

83 2.2 Unitary Transformation and Signal Representation 65 the standard basis I =[e 1,, e N ]as: x[1] x[1] N x =. = x[n]e n =[e 1,, e N ]. = Ix (2.2) n=1 x[n] x[n] can also be represented by the basis U =[u 1,, u N ]as: c[1] x = Ix = UU x = Uc =[u 1,, u N ]. N. = c[k]u k (2.21) k=1 c[n] where we have defined c[1] c = = U x =. c[n] u 1. u N Combining the two equations we get { c = U x x = Uc x, i.e. c[k] =u k x =< x, u k > (2.22) (2.23) This is the generalized Fourier transform in Eqs.2.88 and 2.89, by which a vector x is rotated to become another vector c. This result can be extended to the continuous transformation first given in Eqs.2.91 and 2.93 for signal vectors in the form of continuous functions. In general, corresponding to any given unitary transformation U, asignalvectorx H can be alternatively represented by a coefficient vector c = U x (where c can be either a set of discrete coefficients c[k] oracontinuousfunctionc(f)). The original signal vector x can always be reconstructed from c by applying U on both sides of c = U x to get Uc = UU x = Ix = x, i.e.,wegetaunitarytransform pair in the most general form: { c = U x (2.24) x = Uc The first equation is the forward transform that maps the signal vector x to a coefficient vector c, whilethesecondequationistheinversetransformabywhich the signal is reconstructed. In particular, when U = I is an identity operator, both equations in Eq.2.24 become an identity x = Ix = x, i.e.,notransformation is carried out. Previously we considered the rotation of a given vector x We next consider the rotation of the basis that spans the space. Specifically let {a k } be an arbitray basis of C N (not necessarily orthogonal), then any vector x can be represented in terms of a set of coefficients c[k]: x = N c[k]a k (2.25) k=1

84 66 Vector Spaces and Signal Representation Figure 2.7 Rotation of vectors and bases Rotating this vector by a unitary matrix U, wegetanewvector: [ N ] N N Ux = U c[k]a k = c[k]ua k = c[k]a k = y (2.26) k=1 k=1 This equation indicates that vector y after the rotation can still be represented by the same set of coefficients c[k], if the basis {a k } is also rotated the same way to become a k = Ua k,asillustratedinfig.2.7(a)forthe2-dcase. Under the original basis {a k },therotatedvectory can be represented in terms of a set of new coefficients {,d[k], }: y = k=1 d[1] N d[k]a k =[a 1,, a N ]. k=1 d[n] (2.27) The N new coefficients d[n] can be obtained by solving this linear equation system with N equations (with O(N 3 )complexity). On the other hand, if we rotate y in the opposite direction by the inverse matrix U 1 = U,ofcoursewegetx back: [ N ] U 1 y = U 1 d[k]a k = k=1 N d[k]u 1 a k = k=1 N d[k]b k (2.28) where b k = U 1 a k = U a k is the kth vector of a new basis obtained by rotating a k of the old basis in the opposite direction. In fact, as P ak (y) = < y, a k > a k = < Ux, Ub k > Ua k k=1 = < x, b k > n k = P bk (x) (2.29) we see that the scalar projection of the new vector y = Ux onto the old basis a k is the same as that of the old vector x onto the new basis b k = U 1 a k.inother words, a rotation of the vector is equivalent to a rotation in the opposite direction of the basis, as one would intuitively expect. This is illustrated in Fig.2.7(b). A rotation in a 3-D space is illustrated in Fig.2.8.

85 2.2 Unitary Transformation and Signal Representation 67 Figure 2.8 Rotation of coordinate system In summary, multiplication of a vector x C N by a unitary matrix corresponds to a rotation of the vector. The transformation pair in Eq.2.23 can therefore be interpreted as a rotation of x to get the coefficients U x = c, and arotationofc in the opposite direction x = Uc gets the original vector x back. Moreover, a different rotation d = V x by another unitary matrix V will result in a different set of coefficients d, andthesetwosetsofcoefficientsc and d are also related by a rotation corresponding to a unitary matrix W = V U: d = V x = V Uc = Wc (2.21) Example 2.5: In Example 2.2, a vector x =[1, 2] T =1e 1 +2e 2 is represented under a basis composed of a 1 =[1, ] T and a 2 =[ 1, 2] T : [ ] [ ] [ ] x =1a 1 +2a 2 =2 +1 = (2.211) 2 2 This basis {a 1, a 2 } can be rotated by θ =45 by a orthogonal matrix: [ ] [ ] cos θ sin θ 1 1 R = =.77 sin θ cos θ 1 1 to become a new basis {b 1, b 2 }: [ ] 1 b 1 = Ra 1 = R =.77 [ 1 1 ], b 2 = Ra 2 = R [ ] 1 =.77 2 (2.212) [ ] 3 1 (2.213)

86 68 Vector Spaces and Signal Representation Figure 2.9 Rotation of a basis Under this new basis, x is represented as: x = c [1]b 1 + c [2]b 2 = c [1].77 =.77 [ ][ ] c [1] c = [2] [ ] 1 2 [ ] 1 + c [2].77 1 [ ] 3 1 (2.214) Solving this we get c [1] = 2.47 and c [2] =.35, i.e., x =2.47b b 2, as shown in Fig.2.9. In this case, the coefficients c [1] and c [2] cannot be found as the projections of x onto basis vectors b 1 and b 2 as they are not orthogonal. We see that the same vector x can be equivalently represented by different bases: x =1e 1 +2e 2 =2a 1 +1a 2 =2.47b b 2 (2.215) 2.3 Projection Theorem and Signal Approximation Projection Theorem and Pseudo-Inverse Asignalinahighdimensionalspace(possiblyinfinitedimensional)mayneed to be approximated in a lower dimensional subspace, for various reasons such as computational complexity reduction and data compression. Although a complete basis is necessary to represent any given vector in a vector space, it is still possible to approximate the vector in a subspace if certain error is allowed. Also, acontinuousfunctionmaynotbeaccuratelyrepresentableinafinitedimensional space, but it may still be needed to approximate the function in such a space for certain signal processing desired. The issue in such approximation is how to minimize the error. Let H be a Hilbert space (finite or infinite dimensional), and U H be an M-D subspace spanned by a set of M basis vectors {a 1,, a M } (not necessarily orthogonal), and assume a given vector x H is approximated by a vector ˆx

87 2.3 Projection Theorem and Signal Approximation 69 U: An error vector is defined as x ˆx = M c[n]a n (2.216) n=1 x = x ˆx = x M c[n]a n (2.217) n=1 The least squares error of the approximation is defined as: ε = x 2 =< x, x > (2.218) The goal is to find a set of coefficients c[1],,c[m] sothattheerrorε is minimized. Theorem 2.7. (The projection theorem) The least squares error ε = x 2 of the approximation by equation is minimized if and only if the error vector x = x ˆx is orthogonal to the subspace U: x U, i.e., x a n, (n =1,,M) (2.219) Proof: Let ˆx and ˆx be two vectors both in the subspace U, whereˆx is arbitrary but ˆx is the projection of x onto U, i.e.,(x ˆx) U. Asˆx ˆx is also a vector in U, wehave(x ˆx) (ˆx ˆx ), i.e., < x ˆx, ˆx ˆx >=.Now consider the error associated with ˆx : x ˆx 2 = x ˆx + ˆx ˆx 2 = x ˆx 2 + < x ˆx, ˆx ˆx > + < ˆx ˆx, x ˆx > + ˆx ˆx 2 = x ˆx 2 + ˆx ˆx 2 x ˆx 2 (2.22) We see that the error x ˆx 2 associated with ˆx is always greater than the error x ˆx 2 associated with ˆx, unlessˆx = ˆx, i.e.,theerrorisminimizedif and only if the approximation is ˆx, theprojectionofx onto the subspace U. Q.E.D. This theorem can be understood intuitively as shown in Fig.2.1, where a vector x in a 3-D space is approximated by a vector ˆx in a 2-D subspace ˆx = c[1]a 1 + c[2]a 2.Theerrorvectorε = x ˆx 2 is indeed minimum if x ˆx is orthogonal to the 2-D plane spanned by the basis vectors a 1 and a 2,asany other vector ˆx in this plane would be associated with a larger error, i.e., the approximation ˆx is the projection of x onto the subspace U. The coefficients corresponding to the optimal approximation can be found based on the projection theorem. As the minimum error vector x has to be

88 7 Vector Spaces and Signal Representation Figure 2.1 Projection theorem orthogonal to each of the basis vectors that span the subspace U, wehave: i.e. M M < x, a m >=< x c[n]a n, a m >=< x, a m > c[n] < a n, a m >=, n=1 (m =1,,M) (2.221) < x, a m >= n=1 M c[n] < a n, a m >, (m =1,,M) (2.222) n=1 These M equations can be written in matrix form: < x, a 1 > < a 1, a 1 > < a M, a 1 >. =..... < x, a M > < a 1, a M > < a M, a M > M 1 M M c[1]. c[m] M 1 (2.223) Solving this system of M equations and M unknowns, we get the optimal coefficients c[n] andthevectorx can be approximated in the M-D subspace as: M M ˆx = c[n]a n = n=1 n=1 < x, a n > a n 2 a n = M p an (x) (2.224) We see that ˆx is the vector sum of the projections of x onto each of the basis vectors a n (n =1,,M)ofthesubspaceU. The computational complexity for solving the linear system in Eq is O(M 3 ). However, if the basis vectors of the Hilbert space are orthogonal < a m, a n >= for all m n, thenalloff-diagonalcomponentsofthembym matrix in Eq above are zero, and each of the coefficients can be obtained n=1

89 2.3 Projection Theorem and Signal Approximation 71 independently with complexity O(M 2 ): c[n] = < x, a n > < a n, a n > = < x, a n > a n 2, (n =1,,M) (2.225) Moreover, if the basis is orthonormal with < a n, a n >= a 2 =1,thenwehave and c[n] =< x, a n >, (n =1,,M) (2.226) ˆx = M < x, a n > a n (2.227) n=1 Consider for example the space C N spanned by a basis {a 1,, a N } (not necessarily orthogonal). We wish to express a given vector x C N in an M-D subspace spanned by M basis vectors {a 1,, a M } as: x[1] c[1] M x =. = c[n]a n =[a 1,, a M ] N M. = Ac (2.228) n=1 x[n] c[m] N 1 M 1 This equation system is over-determined with only M unknowns c[1],,c[m] but N>Mequations. As the N by M non-square matrix A is not invertible, the system has no solution in general, indicating the impossibility of representing the N-D vector x in an M-D subspace. However, based on the projection theorem, we can find the optimal approximation of x in the M-D subspace by solving Eq In this case the inner products in the equation become < x, a n >= a nx and < a m, a n >= a na m,eq.2.223canbewrittenas: A x = A Ac (2.229) where A A is an M by M square matrix and therefore invertible. Pre-multiplying its inverse (A T A) 1 on both sides, we can find the optimal solution for c of the over-determined equation system corresponding to the minimum least square error: where c =(A A) 1 A x = A x (2.23) A =(A A) 1 A (2.231) is an M by N matrix, known as the generalized inverse or pseudo-inverse of the N by M matrix A, andwehave:a A = I. 2 If all N basis vectors can be used, 2 The pseudo-inverse in Eq is for the case where A has more columns than rows (M <N in this case). If A has more rows than columns (M >N in this case), the pseudo-inverse becomes: A = A (AA ) 1 (2.232)

90 72 Vector Spaces and Signal Representation then A becomes an N by N square matrix and the pseudo-inverse becomes the regular inverse: and the coefficients can be found simply by: A = A 1 (A ) 1 A = A 1 (2.233) c = A 1 x (2.234) If the basis are orthogonal, i.e., < a m, a n >=forallm n, them coefficients can be found as c[n] = < x, a n > < a n, a n > = < x, a n > a n 2, (n =1,,,M) (2.235) with complexity O(M 2 ). Moreover, if the basis is orthonormal with a n 2 =1, the coefficients become: and the approximation becomes: c[n] =< x, a n >= a nx, (n =1,,M) (2.236) M M ˆx M = c[n]a n = < x, a n > a n (2.237) n=1 This is actually the unitary transformation in Eq We see that under any orthonormal basis {a n } of C N,agivenvectorx can always be optimally approximated in the M-D subspace (M <N)withleastsquareerror: ε = x 2 =< x ˆx M, x ˆx M > n=1 = < x, x > < x, ˆx M > < ˆx M, x > + < ˆx M, ˆx M > M M M = x 2 < x, a n > c[n] c[n] < a n, x > + c[n] 2 = x 2 n=1 M c[n] 2 = n=1 N n=m+1 n=1 n=1 c[n] 2 (2.238) The last equation is due to Parseval s identity x 2 = c 2 = N n=1 c[n] 2. When M N, thesequenceˆx M converges to x: and Eq becomes: lim ˆx M = lim M N M N M c[n]a n = n=1 lim M N ε = x 2 N c[n]a n = x (2.239) n=1 N c[n] 2 = (2.24) n=1 This is of course Parseval s identity x 2 = c 2

91 2.3 Projection Theorem and Signal Approximation 73 Example 2.6: Consider a 3-D Euclidean space R 3 spanned by a set of three linearly independent vectors: 1 a 1 =, 1 a 2 = 1, 1 a 3 = 1 (2.241) 1 We want to find two coefficients c[1] and c[2] so that a given vector x =[1, 2, 3] T can be optimally approximated as ˆx = c[1]a 1 + c[2]a 2 in the 2-D subspace spanned by a 1 and a 2.Firstweconstructamatrix composed of a 1 and a 2 : 11 A =[a 1, a 2 ]= 1 (2.242) Next we find the pseudo inverse of A: A =(A T A) 1 A T = The two coefficients can then be obtained as: c = [ ] c[1] = A x = c[2] [ [ ] ] 1 2 = 3 [ ] 1 2 (2.243) (2.244) The optimal approximation is therefore ˆx = c[1]a 1 + c[2]a 2 = = 2 (2.245) which is indeed the projection of x =[1, 2, 3] T onto the 2-D subspace spanned by a 1 and a 2. Alternatively if we want to approximate x by a 2 and a 3 as ˆx = c[2]a 2 + c[3]a 3, we have: 11 A =[a 2, a 3 ]= 11, A = 1 [ ] 11 2 (2.246) and [ ] c = A x =, ˆx = c[2]a 2 + c[3]a 3 = = (2.247) If all three basis vectors can be used, then the coefficients can be found as: c = A 1 x =[a 1, a 2, a 3 ] 1 x = = 1 (2.248) 1 3 3

92 74 Vector Spaces and Signal Representation and x can be precisely represented as: 1 x = c[1]a 1 + c[2]a 2 + c[3]a 3 = Ac = 2 (2.249) Signal Approximation As discussed above, a signal vector can be represented equivalently under different bases that span the space, in terms of the total energy (Parseval s equality). However, these representations may differ drastically in term of how different types of information contained in the signal is concentrated in different signal components and represented by the coefficients. Sometimes certain advantages can be gained from one particular basis compared to another, depending on the specific application. In the following we consider two simple examples to illustrate such issues. Example 2.7: Given a signal x(t) =t defined over t<2(undefinedoutside the range), we want to optimally approximate it in a subspace spanned by the following two bases. First we use the standard functions e1 (t) ande 2 (t): where e 1 (t) ande 2 (t) aredefinedas: e 1 (t) = ˆx(t) =c[1]e 1 (t)+c[2]e 2 (t) (2.25) { 1, t<1, 1 t<2, e 2(t) = {, t<1 1, 1 t<2 (2.251) These two basis functions are obviously orthonormal <e i (t),e j (t) >= δ[i j]. Following the projection theorem, the coefficients c[1] and c[2] can be found by solving these two simultaneous equations (Eq.2.222): c[1] c[1] 2 2 e 1 (t)e 1 (t)dt + c[2] e 1 (t)e 2 (t)dt + c[2] 2 2 e 2 (t)e 1 (t)dt = e 2 (t)e 2 (t)dt = 2 2 x(t)e 1 (t)dt x(t)e 2 (t)dt As e 1 (t) ande 2 (t) areorthonormal,theequationsystembecomesdecoupled and the two coefficients c[1] and c[2] can be obtained independently as the projections of x(t) ontoeachofthebasisfunctions. c[1] = 2 x(t)e 1 (t)dt = 1 tdt=.5, c[2] = 2 x(t)e 2 (t)dt = 2 1 tdt=1.5 (2.252)

93 2.3 Projection Theorem and Signal Approximation 75 Now the signal x(t) canbeapproximatedas: ˆx(t) =.5e 1 (t)+1.5e 2 (t) = {.5, t<1 1.5, 1 t<2 (2.253) Next, we use two different basis functions u1 (t) andu 2 (t): ˆx(t) =d[1]u 1 (t)+d[2]u 2 (t) (2.254) where u 1 (t) = 1 2 [e 1 (t)+e 2 (t)] = 1 2 u 2 (t) = 1 { 1/ 2, t<1 [e 1 (t) e 2 (t)] = 2 1/ 2, 1 t<2 Again these two basis functions are orthonormal = δ[i j], and the two coefficients d[1] and d[2] can be obtained independently as: d[1] = 2 The approximation is: x(t)u 1 (t)dt = 2, d[2] = ˆx(t) = 2u 1 (t) 1 2 u 2 (t) = 2 x(t)u 2 (t)dt = 1 2 (2.255) {.5, t<1 1.5, 1 t<2 (2.256) We see that the approximations based on these two different bases happen to be identical as illustrated in Fig We can make the following observations: The first basis {e1 (t),e 2 (t)} is the standard basis, the two coefficients c[1] and c[2] represent the average values of the signalduringtwoconsecutivetime segments. The second basis {u1 (t),u 2 (t)} represents the signal x(t) in a totally different way. The first coefficient d[1] represents the average of the signal ( frequency), while the second coefficient d[2] represents the variation of the signal in terms of the difference between the first half and the second. (In fact they correspond to the first two frequency components in several orthogonal transforms, including the discrete Fourier transform, discrete cosine transform, Walsh-Hadamard transform, etc.) The second basis {u1 (t),u 2 (t)} is a rotated version of the first basis {e 1 (t),e 2 (t)}, as shown in Fig.2.12, and naturally they produce the same approximation ˆx(t). Consequently, the two sets of coefficients {c[1],c[2]} and {d[1],d[2]} are related by an orthogonal matrix representing the rotation by an angle θ = 45 : [ d[2] d[1] ] = [ cos θ sin θ sin θ cos θ ][ ] [ c[2] 2/2 2/2 = c[1] 2/2 2/2 ][ 1/2 3/2 ] = [ ] 1/ 2 2 (2.257)

94 76 Vector Spaces and Signal Representation Figure 2.11 Approximation of a signal by two different basis functions Figure 2.12 Representation of a signal vector under two different bases Example 2.8: The temperature is measured every 3 hours in a day to obtain 8 samples as shown below: Time (hours) Temperature (F) These time samples can be considered as a vector x =[x[1],,x[8]] T = [65, 6, 65, 7, 75, 8, 75, 7] T in R 8 space under the standard basis implicitly used, i.e., the nth element x[n] is the coefficient for the nth standard basis vector

95 2.3 Projection Theorem and Signal Approximation 77 e n =[,,, 1,,, ] T (all elements are zero except the nth one), i.e., x = 8 x[n]e n (2.258) n=1 This 8-D signal vector x is approximated in an M-D subspace (M <8) as shown below for different M values: M =1:x is approximated as ˆx = c[1]b1 in a 1-D subspace spanned by b 1 = [1, 1, 1, 1, 1, 1, 1, 1] T.Herethecoefficientcanbeobtainedas: c[1] = < x, b 1 > = 56 =7 (2.259) 8 which represents the average or DC component of the daily temperature. The approximation is: ˆx = c[1]b 1 =[7, 7, 7, 7, 7, 7, 7, 7] T (2.26) The error vector is x = x ˆx =[ 5, 1, 5,, 5, 1, 5, ] T and the error is x 2 =3. M =2: x can be better approximated in a 2-D subspace spanned by the same b 1 and a second basis vector b 2 =[1, 1, 1, 1, 1, 1, 1, 1] T.Asb 2 is orthogonal to b 1,itscoefficientc[2] can be found independently: c[2] = < x, b 2 > = 4 = 5 (2.261) 8 which represents the temperature difference between morning and afternoon. The approximation is: ˆx = c[1]b 1 + c[2]b 2 =[65.65, 65, 65, 75, 75, 75, 75] T (2.262) The error vector is x = x ˆx =[, 5,, 5,, 5,, 5] T and the error is x 2 =1. M =3:Theapproximationcanbefurtherimprovedifathirdbasisvectorb3 = [1, 1, 1, 1, 1, 1, 1, 1] T is added. As all three basis vectors are orthogonal to each other, the coefficient c[3] can also be independently obtained: c[3] = < x, b 3 > = 2 = 2.5 (2.263) 8 which represents the temperature difference between day-time and night-time. The approximation can be expressed as: ˆx = c[1]b 1 + c[2]b 2 + c[3]b 3 =[62.5, 62, 5, 67.5, 67.5, 77.5, 77.5, 72.5, 72.5] T (2.264) The error vector is x = x ˆx =[2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5] T and the error is x 2 =5. We can now make the following observations:

96 78 Vector Spaces and Signal Representation The original 8-D signal vector x can be approximated by M<8basisvectors spanning a M-D subspace. As more basis vectors are included in the approximation, the error becomes progressively smaller. Atypicalsignalcontainsbothslow-varyingorlow-frequencycomponentsand fast-varying or high-frequency components, and the former is likely to contain more energy compared to the latter. In order to reduce error when approximating the signal, basis functions representing lower frequencies should be used first. When progressively more basis functions representing more details or subtle variations in the signal are added in the signal approximation, their coefficients are likely to have lower values compared to those for the slow-varying basis functions, and they are more likely to be affected by noise such as some random fluctuation, therefore they are less significant and could be neglected without losing much essential information. The three basis vectors b1, b 2 and b 3 used above are actually the first three basis vectors of the sequency-ordered Hadamard transform to be discussed in Chapter??. 2.4 Frames and Biorthogonal Bases Frames Previously we considered the representation of a signal vector x H as some linear combination of an orthogonal basis {u n } that spans the space: x = n c[n]u n = n < x, u n > u n (2.265) and Parseval s identity x 2 = c 2 indicates that x is equivalently represented by the coefficients c without any redundancy. However, sometimes it may not be easy or even possible to identify a set of linearly independent and orthogonal basis vectors in the space. In such cases we couldstillconsiderrepresentinga signal vector x by a set of vectors {f n } which may not be linearly independent and therefore do not form a basis of the space. A main issue is the redundancy that exists among such a set of non-independent vectors. As it is now possible to find a set of coefficients d[n] sothat n d[n]f n =,animmediateconsequence is that the representation is no longer unique: x = n c[n]f n = n c[n]f n + n d[n]f n = n (c[n]+d[n])f n (2.266) One consequence of the redundancy is that Parseval s identify no longer holds. The energy contained in the coefficients c 2 may be either higher or lower then

97 2.4 Frames and Biorthogonal Bases 79 the actual energy x 2 in the signal. We therefore need to develop some theory to address this issue when using non-independent vectors for signal representation. First, in order for the expansion x = n c[n]f n to be a precise representation of the signal vector x in terms of a set of coefficients c[n] =< x, f n >,weneed to guarantee that for any vectors x, y H, thefollowingalwaysholds: < x, f n >=< y, f n > iff x = y (2.267) Moreover, these representations also need to be stable in the following two aspects. Stable representation: If the difference between two vectors is small,thedifferencebetweentheir corresponding coefficients should also be small: if x y 2, then < x, f n > < y, f n > 2 (2.268) i.e., n < x, f n > < y, f n > 2 B x y 2 (2.269) n where <B< is a positive real constant. In particular if y = and therefore < y, f n >=,wehave: < x, f n > 2 B x 2 (2.27) n Stable reconstruction: If the difference between two sets of coefficients is small, the difference between the reconstructed vectors should also be small: if < x, f n > < y, f n > 2, then x y 2 (2.271) i.e., n A x y 2 n < x, f n > < y, f n > 2 (2.272) where <A< is also a positive real constant. Again if y = and < y, f n >=,wehave: A x 2 n < x, f n > 2 (2.273) Combining Eqs.2.27 and 2.273, we have the following definition: Definition Afamilyoffiniteorinfinitevectors{f n } in Hilbert space H is a frame if there exist two real constants <A B<, calledthelowerand upper bounds of the frame, such that for any x H, thefollowingholds: A x 2 n < x, f n > 2 B x 2 (2.274)

98 8 Vector Spaces and Signal Representation In particular, if A = B, i.e., A x 2 = n < x, f n > 2 (2.275) then the frame is tight Signal Expansion by Frames and Riesz Bases Our purpose here is to represent a given signal vector x H as a linear combination x = n c[n]f n of a set of frame vectors {f n }.Theprocessoffinding the coefficients c[n] needed in the combination can be considered as a frame transformation, denotedbyf,thatmapsthegivenx to a coefficient vector c: c = F x =[,c[n], ] T =[,<x, f n >, ] T (2.276) where we have defined c[n] =< x, f n >,followingtheunitarytransformationin Eq Here F is the adjoint of another transformation F,whichcanbefound from the following inner product in the definition of a unitary transformation (Eq.2.142): < c,f x >= n c [n] < x, f n > = n c [n] < f n, x >=< n c [n]f n, x >=< Fc, x > (2.277) We see that F is a transformation that constructs a vector as a linear combination of the frame {f n } based on a given set of coefficients c : x = F c = n c [n]f n (2.278) We further define an operator FF : FF x = F (F x)=f c = n < x, f n > f n (2.279) Note that different from the unitary transform satisfying UU = UU 1 = I, here FF I is in general not an identity operator. Applying its inverse (FF ) 1 to both sides of the equation above, we get: x =(FF ) 1 F c =(FF ) 1 [ n < x, f n > f n ]= n < x, f n > (FF ) 1 f n = n < x, f n > f n = n c[n] f n (2.28) where we have defined f n,calledthedual vector of f n,as: f n =(FF ) 1 f n, i.e. f n =(FF ) f n (2.281)

99 2.4 Frames and Biorthogonal Bases 81 Note that (FF ) 1 F =(F ) above is actually the pseudo-inverse of F satisfying: We can define (F ) as another transformation: and rewrite Eq.2.28 as: (F ) F =(FF ) 1 FF = I (2.282) F =(FF ) 1 F =(F ) (2.283) x = F c = F [,c[n], ] T = n < x, f n > f n = n c[n] f n (2.284) This is the inverse frame transformation which reconstructs the vector x based on the coefficients c obtained by the forward frame transformation in Eq Eqs and form a frame transformation pair, similar to the unitary transformation pair in Eq We can find the adjoint of F from the following inner product (by reversing the steps in Eq.2.277: < F c, x > = < n c[n] f n, x >= n c[n] < f n, x > = n c[n] < x, f n > =< c, F x > (2.285) Here F is the adjoint of F : F x =[,<x, f n >, ] T =[,d[n], ] T = d (2.286) where we have defined d[n] =< x, f n >.ReplacingF by F in Eq , we get F =(F F ) 1 F = F (2.287) which is the pseudo-inverse of F satisfying: F F =(F F ) 1 F F = F F = I (2.288) Theorem 2.8. Avectorx H can be equivalently represented by either of the two dual frames {f n } or { f n }: x = n < x, f n > f n = n < x, f n > f n (2.289) Proof: Consider the inner product < x, x >, withthefirstx replaced by the expression in Eq.2.28: < x, x > = < n < x, f n > f n, x >= n < x, f n >< f n, x > = < x, n < f n, x >f n >=< x, n < x, f n > f n > (2.29)

100 82 Vector Spaces and Signal Representation Comparing the two sides of the equation, we get: x = n < x, f n > f n = n d[n]f n (2.291) Combining this result with Eq.2.28, we get Eq Q.E.D. Note that according to Eq Eq can also be written as: x = n < x, f n > f n = n d[n]f n = F d (2.292) We can combine Eqs and together with Eq to form two alternative frame transformation pairs based on either frame {f n } or its dual { f n }: { { c[n] =< x, f n > d[n] =< x, f n > x = n c[n] f n = n < x, f n > f n x = n d[n]f n = n < x, f n > f n (2.293) These equations are respectively the forward and inverse frame transformation of x based on the frame {f n } and its dual { f}, whichcanalsobeexpressed(due to Eqs and 2.292) more concisely as: { c = F { x d = F x = F x x = F c =(F ) (2.294) c x = F d The frame transformation pairs in Eqs can be considered as a generalization of the unitary transformation given in Eq.2.24, which is carried out by U and its inverse U 1,whiletheframetransformationpairsarecarriedoutbyF (or F )anditspseudo-inverse F =(F ) (or F = F ). We also see from Eq that FF x = F F x = x (2.295) i.e., FF =(F ) F = I and F F = F F = I, similartou 1 U = U U = I. Also, similar to the unitary transformation, the signal energy is conserved by the frame transformation: x 2 =< x, x > = < F c, x >=< c, F x >=< c, d > = <Fd, x >=< d,f x >=< d, c > (2.296) This relationship can be considered as the generalized version of Parseval s identity. However, we note that: c 2 =< c, c >=< F x,f x >=< FF x, x > < x, x >= x 2 d 2 < d, d >=< F x, F x >=< F F x, x > < x, x >= x 2 (2.297) To find out how the signal energy is related to the energy contained in either of the two sets of coefficients, we need to study further the operator FF.Consider

101 2.4 Frames and Biorthogonal Bases 83 the inner product of Eq and another vector y: <FF x, y >= n < x, f n >< f n, y >=< x, n < f n, y >f n > = < x, n < y, f n > f n >=< x,ff y > (2.298) which indicates that FF is a self-adjoint operator. If we let {λ n } and {φ n } be the eigenvalues and eigenvectors of FF,i.e., FF φ n = λ n φ n, (for all n) (2.299) then all {λ n } are real, and all {φ n } are orthogonal < φ m, φ n >= δ[m n] and they form a complete orthogonal system (Theorem 2.4). Now x can also be expanded in terms of these eigenvectors as: x = n < x, φ n > φ n (2.3) and the energy contained in x is: x 2 =< x, x >=< < x, φ m > φ m, < x, φ n > φ n > m n = < x, φ m > < x, φ n ><φ m, φ n >= < x, φ n > 2 (2.31) m n n For the dual frame transformation F,wehave: F F =[(FF ) 1 F ][(FF ) 1 F ] =(FF ) 1 FF (FF ) 1 =(FF ) 1 (2.32) which is also a self-adjoint operator whose eigenvalues and eigenvectors are respectively {1/λ n } and {φ n },i.e.,: F F φ n =(FF ) 1 φ n = 1 λ n φ n, (for all n) (2.33) Theorem 2.9. The frame transformation coefficients c[n] =< x, f n > and d[n] =< x, f n > satisfy respectively the following inequalities: λ min x 2 n 1 λ max x 2 n < x, f n > 2 = c 2 = F x 2 λ max x 2 (2.34) < x, f n > 2 = d 2 = F x 2 1 λ min x 2 (2.35) where λ min and λ max are respectively the smallest and largest eigenvalues of the self-adjoint operator FF.Whenalleigenvaluesarethesame,thenλ max = λ min = λ and the frame is tight: < x, f n > 2 = λ x 2, < x, f n > 2 = 1 λ x 2 (2.36) n n

102 84 Vector Spaces and Signal Representation Proof: Applying (FF ) 1 to both sides of Eq we get: (FF ) 1 x = n < x, f n > (FF ) 1 f n = n < x, f n > f n (2.37) This result and Eq form a symmetric pair: (FF )x = n (FF ) 1 x = n < x, f n > f n (2.38) < x, f n > f n (2.39) Taking the inner product of each of these equations with x, weget: < (FF )x, x > = n < x, f n >< f n, x >= n < x, f n > 2 = n c[n] 2 = c 2 (2.31) < (FF ) 1 x, x > = n < x, f n >< f n, x >= n < x, f n > 2 = n d[n] 2 = d 2 (2.311) These two expressions represent the energy contained in each of the two sets of coefficients c[n] =< x, f n > and d[n] =< x, f n >. We will now carry out the following two parallel steps. First, we apply FF to both sides of Eq.2.3: FF x = FF ( n < x, φ n > φ n )= n < x, φ n >FF φ n = n < x, φ n >λ n φ n (2.312) and take inner product with x on both sides: <FF x, x > = < n < x, φ n >λ n φ n, x >= n < x, φ n >λ n < φ n, x > = n λ n < x, φ n > 2 (2.313) Replacing the left hand side by Eq.2.31, we get: < x, f n > 2 = λ n < x, φ n > 2 (2.314) n n Applying Eq.2.31 to the right-hand side we get: λ min x 2 < x, f n > 2 λ max x 2 (2.315) n Next, we apply (FF ) 1 to both sides of Eq.2.3: (FF ) 1 x = n < x, φ n > (FF ) 1 φ n = n < x, φ n > 1 λ n φ n (2.316)

103 2.4 Frames and Biorthogonal Bases 85 and take inner product with x on both sides: < (FF ) 1 x, x >= < x, φ n > 1 < φ λ n, x >= 1 < x, φ n n λ n > 2 n n (2.317) Replacing the left hand side by Eq.2.311, we get: < x, f n > 2 = 1 < x, φ λ n > 2 (2.318) n n n Applying Eq.2.31 to the right-hand side we get: 1 λ max x 2 n < x, f n > 2 1 λ min x 2 (2.319) Q.E.D. This theorem indicates that the frame transformation associated with either F or F does not conserve signal energy, due obviously to the redundancy of the non-independent frame vectors. However, as shown in Eq.2.296, the energy is conserved when both sets of coefficients are involved. Theorem 2.1. Let λ k and φ k be the kth eigenvalue and the corresponding eigenvector of operator FF : FF φ k = λ k φ k for all k, Then λ k = f n 2 1, = f λ n 2 (2.32) n k n k k Proof: As FF is self-adjoint its eigenvalues λ k s are real and its eigenfunctions are orthogonal < φ k, φ l >= δ[k l], we therefore have: λ k = λ k < φ k, φ k >= <FF φ k, φ k > k k k = < < φ k, f n > f n, φ k >= < f n, φ k > 2 (2.321) k n k n On the other hand: f n 2 = < f n, f n >=< < f n, φ k > φ k, < f n, φ l > φ l > k l = < f n, φ k > < f n, φ l ><φ k, φ l >= < f n, φ k > 2 k l k (2.322) Therefore we get f n 2 = < f n, φ k > 2 = λ k (2.323) n n k k The second equation in the theorem can be similarly proved. Q.E.D.

104 86 Vector Spaces and Signal Representation Definition If the vectors in a frame are linearly independent, the frame is called a Riesz basis. Theorem (Biorthogonality of Riesz basis) A Riesz basis {f n } and its dual { f n } form a pair of biorthogonal bases satisfying < f m, f n >= δ[m n], m,n Z (2.324) Proof: We let x = f m in Eq and get: f m = n < f m, f n > f n (2.325) Since these vectors are linearly independent, i.e., f m cannot be expressed as a linear combination of the rest of the frame vectors, the equation above has only one interpretation: all coefficients < f m, f n >= for all n m except when n = m<f m, f m >=1.In otherwords,theseframevectorsareorthogonalto their dual vectors, i.e., Eq holds. Q.E.D. If the dual frames f and f in Theorem 2.8 are a pair of biorthogonal bases, then Eq is a biorthogonal transformation: x = n < x, f n > f n = n < x, f n > f n (2.326) In summary, we see that signal representation by a set of linearly independent and orthogonal basis vectors x = n c[n]φ n = n < x, b n > x n (Eq.2.88) is now much generalized so that the signalisrepresentedbyasetofframevectors, which are in general neither linearly independent nor orthogonal. The representation can be in either of the two dual frames, and the frame transformation and its inverse are pseudo-inverse of each other. Moreover, the signal energy is no longer conserved by the transformation, as Parseval s identity is invalid due to the redundancy in the frame. Instead, the signal energy and the energy in the coefficients are related by Eqs.2.34, 2.35, and On the other hand, we can consider the unitary transformation U as a special kind of frame transformation F.AsUU = U U = I in Eq.2.281, the pseudoinverse in Eq U =(U U) 1 U = U = U 1 becomes a regular inverse, U = Ũ becomes the same as its dual, i.e., u n = ũ n,andthebiorthogonalityin Eq becomes regular orthogonality. Consequently, the two dual transformation pairs in Eq (or 2.294) become identical, a unitary transformation pair. Also, corresponding to the eigenequations of operators FF (Eq.2.299) and F F (Eq.2.33), the eigenequation of operator UU = I becomes a trivial case: UU φ n = U Uφ n = Iφ n = λ n φ n = φ n (2.327)

105 2.4 Frames and Biorthogonal Bases 87 with λ max = λ min = λ n =1(foralln) andbotheqs.2.34and2.35(aswellas Eqs.2.296) become Parseval s identity (Eq.2.188): x 2 = n < x, u n > 2 (2.328) Frames in Finite-Dimensional Space Here we consider the frame transformation in C N.LetF =[f 1,, f M ]bea matrix composed of a set of M frame vectors as its columns. We assume M > N, and the M frame vectors are obviously not independent. The dual frame is also an matrix composed of M dual vectors as its columns F =[ f 1,, f M ]. Any given vector x C N can now be represented by either the frame F (second transformation pair in Eq.2.294) or its dual F (first transformation pair in Eq.2.294), in the form of a matrix multiplication (e.g., the generic operator F becomes a matrix F ): { c = F x x = Fc=(F ) c { d = F x = F x x = Fd (2.329) These frame transformations are in the same form as the unitary transformations in Eq.2.1. However, different from matrices U and U = U 1 in Eq.2.1, here matrices F and F in Eq are not invertible as they are not square matrices (F and F are N by M while F and F are M by N). Consequently, the matrices used in the forward and inverse frame transformations are pseudo-inverse of each other: F =(F F ) 1 F = F, (F ) =(FF ) 1 F = F (2.33) We first represent x in terms of F.Basedonthesecondtransformationin Eq.2.329, the coefficients d can obtained as: f d = F 1 < x, f 1 > x =. x =. (2.331) < x, f M > f M and x is reconstructed by the inverse transformation: < x, f 1 > M x = Fd =[f 1,, f M ]. = < x, f n > f n (2.332) < x, f n=1 M > Alternatively, we can also represent x in terms of the dual frame F.Basedon the first transformation in Eq.2.329, the coefficients c can be obtained as: f 1 < x, f 1 > c = F x =. x =. (2.333) < x, f M > f M

106 88 Vector Spaces and Signal Representation and x is reconstructed by the inverse transformation: < x, f 1 > x = Fc=[ f 1,, f M ]. M. = < x, f n > f n (2.334) n=1 < x, f M > Theorem If a frame F =[f 1,, f M ] in C N is tight, i.e., all eigenvalues λ k = λ of FF are the same, and all frame vectors are normalized f n =1, then the frame bound is M/N. Proof: As FF is an N by N matrix, it has N eigenvalues λ k = λ (k = 1,,N). Then Theorem 2.1 becomes: N M λ k = Nλ = f n 2 = M (2.335) k=1 n=1 i.e., λ = M/N. Q.E.D. In particular, if M = N linearly independent frame vectors are used, then they form a Riesz basis in C N,andF =[f 1,, f N ]becomesannbyninvertible matrix, and its pseudo-inverse is just a regular inverse, and the second equation in Eq.2.33 becomes (F ) 1 = F,i.e., F F = f 1. f N [ f 1,, f N ]=I (2.336) which indicates that these Riesz vectors are indeed biorthogonal: < f m, f n >= δ[m n], (m, n =1,,N) (2.337) Moreover, if these N vectors are also orthogonal, i.e., < f m, f n >= δ[m n], then F = U becomes a unitary matrix satisfying U = U 1,andŨ =(U ) 1 = U, i.e.,thevectorsarethedualoftheirown,andtheyformanorthonormal basis of C N.Nowtheframetransformationbecomesaunitarytransformation U x = c and the inverse is simply Uc = x. AlsotheeigenvaluesofUU = I are all λ n =1,and u n 2 =1,Theorem2.1holdstrivially. Example 2.9: Three normalized vectors in R 2 form a frame: [ ] 1 1/2 1/2 F =[f 1, f 2, f 3 ]= 3/2 3/2 Note that these frame vectors are normalized f n =1.Wealsohave: FF T = 3 [ ] 1, (FF T ) 1 = 2 [ ] (2.338) (2.339)

107 2.4 Frames and Biorthogonal Bases 89 The eigenvalues of these two matrices are obviously λ 1 = λ 2 =3/2 and1/λ 1 = 1/λ 2 =2/3, respectively, indicating this is a tight frame A = B. Thedualframe F can be found as the pseudo-inverse of F T : F =[ f 1, f 2, f 3 ]=(FF T ) 1 F = 2 [ ] 2/3 1/3 1/3 3 F = (2.34) 3/3 3/3 Any given x =[x[1],x[2]] T can be expanded in terms of either of the two frames: x = 3 c[n] f n = n=1 where c = F x or 3 < x, f n > f n = n=1 3 d[n]f n = n=1 3 < x, f n > f n (2.341) c[1] = x[1], c[2] = 1 2 [x[1] + 3x[2]], c[3] = 1 2 [x[1] 3x[2]] (2.342) and d = F x or d[1] = 2 3 x[1], d[2] = 1 3 [x[1] + 3x[2]], d[3] = 1 3 [x[1] 3x[2]] (2.343) n=1 The energy contained in the coefficients c and d is respectively: and c 2 = d 2 = 3 < x, f n > 2 = 3 2 x 2 = λ x 2 (2.344) n=1 3 < x, f n > 2 = 2 3 x 2 = 1 λ x 2 (2.345) n=1 Specifically if we let x =[1, 2] T,then f T c = F T 1 < x, f 1 > 1 x = f T 2 x = < x, f 2 > = 1+ 3 f T 3 < x, f 3 > 1 (2.346) 3 and d = F T x = f T 1 f T 2 f T 3 x = < x, f 1 > < x, f 2 > = 2 < x, f 3 3 > (2.347) Example 2.1: Vectors f 1 and f 2 form a basis that spans the 2-D space: [ ] [ ] [ ] f 1 =, f 2 =, F =[f 1 1, f 2 ]= (2.348) 1

108 9 Vector Spaces and Signal Representation FF T = [ ] 21, (FF T ) 1 = 11 The dual frame can be found to be: [ ] 1 F =(FF T ) 1 F = i.e. f = [ ] [ ] 1, f 1 2 = [ ] 1 (2.349) (2.35) Obviously the biorthogonality condition in Eq is satisfied by these two sets of bases. Next, to represent a vector x =[, 2] T by each of the two bases, we find the coefficients as: c[1] =< x, f 1 >=2; d[1] =< x, f 1 >=; c[1] =< x, f 2 >= 2 d[2] =< x, f 2 >= 2 Now we have: or x = c[1]f 1 + c[2]f 2 =2 x = d[1] f 1 + d[2] f 2 = 2 [ ] [ ] [ ] = 1 2 [ ] [ ] = 1 2 (2.351) (2.352) 2.5 Kernel Function and Mercer s Theorem Definition Akernelisafunctionthatmapstwocontinuousvariablet, τ to a complex value K(t, τ) C. Ifthetwovariablesaretruncatedandsampledto become discrete t m,t n (m, n =1,,N), then the kernel can be represented by an N by N matrix with the mn-th element being K(t m,t n )=K[m, n].ifk(t, τ) = K(τ,t) or K[m, n] =K[n, m], thekernelishermitian(self-adjoint). Definition AcontinuouskernelK(t, τ) is positive definite if the following holds for any function x(t) defined over [a, b]: b b a a x(t)k(t, τ)x(τ)dτ dt > (2.353) A discrete kernel K[m, n] is positive definite if the following holds for any vector x =[x[1],,x[n]]: N m=1 n=1 N x[m]k[m, n]x[n] > (2.354)

109 2.5 Kernel Function and Mercer s Theorem 91 Definition An operator T K associated with a continuous kernel K(t, τ) is defined as: T K x(t) = b a K(t, τ)x(τ)dτ = y(t) (2.355) An operator T K associated with a discrete kernel K[m, n] is a matrix: K[1, 1] K[1, 2] K[1, N] K[2, 1] K[2, 2] K[2, N] T K = T = K[N,1] K[N,2] K[N,N] (2.356) which can be applied to a vector x to generate T K x = Tx= y, orincomponent form: N K[m, n]x[m] =y[n], (n =1,,N) (2.357) m=1 Theorem The operator T K associated with a Hermitian kernel is Hermitian (self-adjoint): <T k x(t),y(t) >=< x(t),t K y(t) > (2.358) Proof: For operator T K associated with a continuous kernel, we have: = <T K x(t),y(t) >= b b a [ a b K(τ,t)y(t)dt] x(τ)dτ = a T K x(t) y(t)dt = b a b b a [ a K(t, τ)x(τ)dτ] y(t)dt x(τ) T K y(τ)dτ =< x(t),t K y(t) > (2.359) For operator T K = T associated with a discrete kernel, we have: [ N N ] < Tx, y >= K[m, n]x[m] y[n] n=1 m=1 [ N N ] = x[m] K[m, n]y[n] =< x, Ty > m=1 n=1 (2.36) Q.E.D. Aself-adjointoperatorT K has all the properties stated in Theorem 2.4. Specifically, let λ k be the kth eigenvalue of a self-adjoint operator T K and φ k (t) orφ k be the corresponding eigenfunction or eigenvector: b a K(t, τ)φ k (τ)dτ = λ k φ k (t), or T K φ k = Tφ k = λ k φ k (2.361)

110 92 Vector Spaces and Signal Representation then we have: 1. All eigenvalues λ k are real; 2. All eigenfunctions/eigenvectors are mutually orthogonal: <φ k (t),φ l (t) >=< φ k, φ l >= δ[k l] (2.362) 3. All eigenfunctions/eigenvectors form a complete orthogonal system, i.e., they form a basis that spans the function/vector space. Theorem (Mercer s Theorem) Let λ k and φ k (t) (k =1, 2, )berespectively the kth eigenvalue and the corresponding eigenfunction of the operator T K associated with a positive definite Hermitian kernel K(t, τ), thenthekernelcan be expanded as: K(t, τ) = λ k φ k (t)φ k (τ) (2.363) k=1 Let λ k and φ k (k =1, 2, )bethektheigenvalueandthecorrespondingeigenvector of the operator T associated with a positive definite Hermitian kernel K[m, n], thenthekernelcanbeexpandedas: K[m, n] = N λ k φ[m, k]φ[n, k], (m, n =1,,N) (2.364) k=1 where φ[m, k] is the mth element of the kth eigenvector φ k = [φ[1,k],,φ[n,k]] T. The general proof of Mercer s theorem in Hilbertspaceisbeyondthescopeof this book and therefore omitted, but the discrete version in C N given in Eq is simply the element form of Eq for any Hermitian matrix: T = N λ k φ k φ k (2.365) k=1 Note that given Eq in Mercer s theorem, Eq can be easily derived: b [ b ] K(t, τ)φ l (τ)dτ = λ k φ k (t)φ k (τ) φ l (τ)dτ = a λ k φ k (t) k=1 b a a k=1 φ k (τ)φ l (τ)dτ = λ k φ k (t)δ[k l] =λ l φ l (t) k=1 (2.366) For example, consider the covariance of a centered stochastic process x(t) with µ x (t) =: σx 2 (t, τ) =E[x(t)x(τ)] = E[x(τ)x(t)] = σ2 x (τ,t) (2.367)

111 2.5 Kernel Function and Mercer s Theorem 93 which is a Hermitian kernel K(t, τ) =σx(t, 2 τ) thatmapstwovariablest and τ to a complex value. Moreover, we can show that it is also positive definite: b b a a f(t) σ 2 x (t, τ) f(τ) dt dτ = b [ b = E f(t)x(t) dt a b a a b a ] f(τ)x(τ) dτ = E E[f(t)x(t) f(τ)x(τ)] dt dτ b a f(t)x(t) dt 2 > (2.368) Let T K be the Hermitian integral operator associated with σx(t, 2 τ) =σ 2 x(τ,t), its eigenequation is: T k φ k (t) = b a σ 2 x (t, τ)φ k(t)dt = λ k φ k (t), k =1, 2, (2.369) where all eigenvalues λ k > arerealandpositive,andtheeigenfunctionsφ k (t) are orthogonal: <φ k (t),φ l (t) >= b a φ k (t)φ l (t)dt = δ[k l] (2.37) and they form a complete orthogonal basis that spans the vector space. If the stochastic process x(t) is truncated and sampled,it become a random vector x =[x[1],,x[n]] T.Thecovariancebetweenanytwocomponentsx[m] and x[n] is σmn 2 = E[x[m]x[n]] = E[x[n]x[m]] = σ2 nm, (m, n =1,,N) (2.371) which is a discrete Hermitian kernel, and the associated operator is the N by N covariance matrix: The eigenequation of this operator is: σ11 2 σ21 2 σ2 12 σ2 1N Σ x = E(xx )= σ2 22 σ2 2N σn1 2 σ2 N2 σ2 NN (2.372) Σ x φ k = λ k φ k, (k =1,,N) (2.373) As Σ = Σ is Hermitian (symmetric if x is real) and positive definite, its eigenvalues λ k are all real positive, and the eigenvectors are orthogonal: < φ k, φ l >= φ T k φ l = δ[k l], (k, l =1,,N) (2.374) and they form a unitary matrix Φ =[φ 1,, φ N ] satisfying Φ 1 = Φ i.e., Φ Φ = I. Eq.2.373canalsobewritteninthefollowingforms: Σ x Φ = ΦΛ, Φ Σ x Φ = Λ, Σ x = ΦΛΦ = N λ k φ k φ k (2.375) k=1

112 94 Vector Spaces and Signal Representation Theorem (Karhunen-Loeve Theorem, continuous) Let σx(t, 2 τ) be the covariance of a centered stochastic process x(t) with µ x = E(x(t)) =, andλ k and φ k (t) be respectively the kth eigenvalue and the corresponding eigenfunction of the integral operator associated with σx(t, 2 τ) as a kernel: T K φ k (t) = b then x(t) can be series expanded as: a σ 2 x(t, τ)φ k (t)dt = λ k φ k (t), for all k (2.376) x(t) = c[k]φ k (t) (2.377) k=1 where c[k] is the kth random coefficients given by c[k] =<x(t),φ k (t) >= b a x(t)φ k (t)dt, k =1, 2, (2.378) which are centered (zero mean) E(c[k]) = and uncorrelated: Cov(c[k],c[l]) = λ k δ[k l] (2.379) Proof: As σx(t, 2 τ) isself-adjoint,theeigenfunctionsφ k (t) oftheassociated operator T K form a complete orthogonal basis, therefore any given stochastic process x(t) canberepresentedasalinearcombinationofφ k (t), i.e., Eq holds. Taking an inner product with φ l (t) onbothsidesofeq.2.377,weget: b <x(t),φ l (t) > = x(t)φ l (t)dt = c[k] <φ k (t),φ l (t) > = a k=1 c[k]δ[k l] =c[l] (2.38) k=1 This is Eq The expectation of this equation is indeed zero: E(c[k]) = E[ b a x(t)φ k (t)dt] = b Finally we show Eq holds: [ b Cov(c[k],c[l]) = E(c[k]c[l]) = E x(t)φ k (t)dt = = b a b a Q.E.D. [ b a ] φ l (τ)e[x(t)x(τ)]dτ φ k (t)dt = a a E[x(t)] φ k (t)dt = (2.381) b a a b a ] x(τ)φ l (τ)dτ [ b ] φ l (τ)σx 2 (t, τ)dτ φ k (t)dt b λ l φ l (t)φ k (t)dt = λ l φ l (t)φ k (t)dt = λ l δ[k l] =λ k δ[k l] (2.382) a

113 2.5 Kernel Function and Mercer s Theorem 95 When the centered stochastic process x(t) istruncatedandsampledtobecome afiniterandomvectorx =[x[1],,x[n]] T with E(x) =µ x =,thekarhunen- Loeve theorem takes the following discrete form: Theorem (Karhunen-Loeve Theorem, discrete) Let Σ x be the covariance matrix of a centered random vector x with µ x = E(x) =, andλ k and φ k be respectively the kth eigenvalue and the corresponding eigenvector of Σ x : then x can be series expanded as: Σ x φ k = λ k φ k, k =1,,N (2.383) x = N c[k]φ k (2.384) k=1 where c[k] is the kth random coefficients given by c[k] =< x, φ k >= φ x, k =1,,N (2.385) which are centered (zero mean) E(c[k]) = and uncorrelated: Cov(c[k],c[l]) = λ k δ[k l] (2.386) Proof: As the covariance matrix Σ x is Hermitian and positive definite, its eigenvalues λ k are all real positive and eigenvectors φ k form a complete orthogonal system by which any x can be series expanded as: x = N c[k]φ k = Φc (2.387) k=1 where c =[c[1],,c[n]] T is a random vector formed by the N coefficients, and Φ =[φ 1,, φ N ] T,i.e.,Eq.2.387holds. To obtain these coefficients, we pre-multiply both sides by Φ 1 = Φ to get: Φ x = Φ Φc = c, i.e. c[k] =< x, φ k >= φ kx, (k =1,,N) (2.388) The mean vector of c is indeed zero: and the covariance matrix of c is: µ c = E(c) =E(Φ x)=φ E(x) = (2.389) Σ c = E(cc )=E[(Φ x)(φ x) ]=E[Φ xx Φ] = Φ E(xx )Φ = Φ Σ x Φ = Λ (2.39) The covariance matrix Σ c = Λ is diagonalized: σ 2 kl = λ kδ[k l], (k, l =1,,N) (2.391) This is Eq Q.E.D. We see that the variance σ 2 k of the kth coefficient c[k] isthektheigenvalueλ k corresponding to the kth eigenvector φ k,andtherandomsignalx is decorrelated

114 96 Vector Spaces and Signal Representation by the transformation c = Φ x in Eq.2.388, as the components c[k] andc[l] of the resulting random signal c are no longer correlated with E(c[k]c[l]) =. Comparing the generalized Fourier expansion in Eqs.2.17 and 2.19 with the Karhunen-Loeve series expansion in Eqs and we see that they are identical in form. However, we need to make it clear that the former is for a deterministic signal with a set of pre-determined basis functions φ k (t); while the latter is for a stochastic signal, and the basis functions φ k (t), are the eigenfunctions of the integral operator associated with the covariance function of the stochastic process, which are dependent on the specific signal being considered. Also note that Eqs and are simply the discrete versions of Eqs and The Karhunen-Loeve theorem and the associated series expansion will be considered in Chapter??. 2.6 Summary We summarize below the most essential points discussed in this chapter based on which the various orthogonal transform methods to be specifically discussed in following chapters will all be looked at from a unified point of view. Atimesignalcanbeconsideredasavectorx H in a Hilbert space, the specific type of which depends on the nature of the signal. For example, a continuous signal x(t) overtimeintervala<t<bis a vector x = x(t) inl 2 space; and its discrete samples is a vector x =[,x[n], ] T is a vector in l 2 space. Moreover, if the signal is truncated to become a set of N samples, then x =[x[1],,x[n]] T is a vector in C N space. The signal vector x can be represented as a linear combination of a set of either countable basis b n or uncountable basis b(t) spanningthespacein which it resides. In particular, if the basis is orthonormal, we have: x = n c[n]b n = n < x, b n > b n (2.392) or x = c(f)b(f)df = < x, b(f) > b(f)df (2.393) Here c[n] =< x, b n > or c(f) =< x, b(f) > is the weighting coefficient or function, representing the analysis of the signal by which the signal is decomposed into a set of components c[n]b n or c(f)b(t), and the summation or integration is the synthesis of the signal by which the signal is reconstructed from its components. Asignalvectorgiveninthedefaultformofatimefunctionorasequenceof discrete values can be considered as a sequence of weighted and shifted time

115 2.6 Summary 97 impulses (Eqs.1.3 and 1.9): x(t) = x(τ)δ(t τ)dτ, (for all t) (2.394) or x[n] = m x[m]δ[m n], (for all m) (2.395) where δ(t τ) andδ[m n] canbeconsideredrespectivelyasthestandard basis of the corresponding signal space, which is always implicitly used in the default representation of a time signal. In other words, the default form of asignalx(t) orx[n] isactuallyasetofcoefficients(countable)orweighting function (uncountable) of the standard basis vectors. The signal vector can be represented by any of the infinite bases all spanning the same space. For example, any unitary transformation of the standard basis will result a particular orthogonal basis, a rotated version of the standard basis. (The standard basis itself corresponds to an identity transformation.) We here only consider orthogonal bases. For a continuous signal x(t), we have: x(t) = c(f)φ f (t)df, (for all t) c(f) =<x(t),φ f (t) >= x(t)φ f (t) dt, (for all f) (2.396) The first equation expresses the signal x(t) asalinearcombinationofasetof uncountable basis functions φ f (t) (sometimesalsoexpressedasφ(t, f)). The second equation, also called an integral transform of x(t), gives the coefficient function c(f) of the linear combination as the projection of x(t) onto the basis function φ f (t), also called the kernel function of the transform. For a discrete signal x =[,x[n], ] T,wehave x = n c[n]b n, or x[m] = n c[n]b[m, n], (for all m) c[n] =< x, b n >= m x[m]b[m, n], (for all n) (2.397) where x[m] isthemthelementofx, andb[m, n] isthemthelementofthe nth basis vector b n.thefirstequationexpressesthesignalvectorasalinear combination of a set of countable basis vectors b n (or in component form b[m, n]) for all n; thesecondequationgivesthenthcoefficientc[n] asthe projection of the signal x onto the corresponding basis vector b n. Both of the two pairs of equations above are unitary (orthogonal if real) transformations. In either case, the second equation is the forward transform that converts the time signal given under the implicit standard basis to a continuous coefficient function or a set of discrete coefficients for a new basis; while the first equation is the inverse transform that represents the signal as alinearcombinationofthenewbasisweightedbythecoefficients.

116 98 Vector Spaces and Signal Representation The representations of the signal under different bases are equivalent, in the sense that the total amount of energy or information contained in the signal, represented by its norm of the vector, is conserved. This is because any two orthogonal bases are always related by a unitary transformation (a rotation), which conserves vector norms according to Parseval s equality. In the rest of the book we will study various orthogonal transforms each representing a given signal vector as the weighting coefficients or function of the corresponding basis used. The topics of interest in the future discussion include: why such a unitary transformation is desirable to start with; why it could represent a given signal in such awaythatitcanbemosteffectively and conveniently processed, analyzed, compressed for transmission and storage, and the information of interest extracted; and how to find the optimal transformation according to certain quantifiable criteria. In addition to the orthogonal transformations based on orthogonal basis vector or functions, each of which carries some independent information of the signal, we will also consider certain non-orthogonal basis functions, or even nonindependent vectors. Specifically the frames discussed previously will be used in wavelet transforms. In such cases, the vectors used for representing the signal may be correlated and there may exist certain redundancy in terms of the signal information they each carry. There are both pros and cons in such signal representations with redundancy. 2.7 Homework Problems 1. Approximate a given 3-D vector x =[1, 2, 3] T in an 2-D subspace spanned by the two standard basis vectors e 1 =[1,, ] T and e 2 =[, 1, ] T.Obtainthe error vector x and verify that it is orthogonal to both e 1 and e Repeat the problem above but now approximate the same 3-D vector x = [1, 2, 3] T above but now in a different 2-D subspace spanned by two basis vectors a 1 =[1,, 1] T and a 2 =[ 1, 2, ] T.Findavectorinthis2-Dsubspace ˆx = c[1]a 1 + c[2]a 2 so that the error x ˆx is minimized. 3. Given two vectors u 1 =[2, 1] T / 5andu 2 =[ 1, 2] T / 5inR 2,dothefollowing: a. Verify that they are orthogonal; b. Normalized them; c. Use them as an orthonormal basis to represent a vector x =[1, 2] T. 4. Use the Gram-Schmidt orthogonalization process to construct two new orthonormal basis vectors b 1 and b 2 from the two vectors a 1 and a 2 used in the previous problem, so that they span the same 2-D space, and then approximate the vector x =[1, 2, 3] T above. Note that as the off-diagonal elements of the 2 by 2 matrix are zero, and both elements on the main diagonal are one, the coefficients c[1] and c[2] can be easily found without solving a linear equation system.

117 2.7 Homework Problems Approximate a function x(t) =t 2 defined over an interval [, 1] in a 2-D space spanned by two basis functions a 1 (t) anda 1 (t): { ( t<1/2) a 1 (t) =1, a 2 (t) = (2.398) 1 (1/2 t<1) 6. Repeat the problem above with the same a 1 but a different a 2 defined as: { 1 ( t<1/2) a 2 (t) = (2.399) 1 (1/2 t<1) Note that a 1 and a 2 are orthogonal <a 1 (t),a 2 (t) >= (they are actually the first two basis function of an orthogonal Walsh-Hadamard transform to be discussed in details later). 7. Repeat the problem above, but now with an additional basis function a 3 defined as: 1 ( t<1/4) a 3 (t) = 1 (1/4 t<3/4) (2.4) 1 (3/4 t<1) so that the 2-D space is expanded to a 3-D space spanned by a 1 (t), a 2 (t) and a 3 (t) (theyareactuallythefirstthreebasisfunctionsofthewalsh-hadamard transform). 8. Approximate the same function x(t) =t 2 above in a 3-D space spanned by three basis functions a (t) =1,a 1 (t) = 2cos(πt), and a 2 (t) = 2cos(2πt), defined over the same time period. These happen to be the first three basis functions of the cosine transform. Hint: The following integral may be needed: x 2 2x cos(ax) cos(ax)dx = a 2 + a2 x 2 2 a 3 sin(ax)+c (2.41) 9. Consider a 2-D space spanned by two orthonormal basis vectors: a 1 = 1 [ ] 3, a 2 = 1 [ ] (2.42) a. Represent vector x =[1, 2] T under this basis as x = c[1]a 1 + c[2]a 2.Find c[1] and c[2]. b. Represent a counter clockwise rotation of θ =3 by a 2 by 2 matrix R. c. Rotate vector x to get y = Rx. d. Represent y above under basis {a 1, a 2 } by y = d[1]a 1 = d[2]a 2.Findthe two coefficients d[1] and d[2]. e. Rotate the basis {a 1, a 2 } in the opposite direction θ = 3 represented by R 1 = R T to get b 1 = Ra 1 and b 2 = Ra 2. f. Represent x under this new basis {b 1, b 2 } (which happens to be the standard basis). g. Verify that d [1] = d[1] and d [2] = d[2], in other words, the representation {d[1],d[2]} of the rotated vector y under the original basis {a 1, a 2 } is equiv-

118 1 Vector Spaces and Signal Representation alent to the representation {d [1],d [2]} of the original vector x under the inversely rotated basis {b 1, b 2 }. 1. In Example 2.8 we approximated the temperature signal, a 8-D vector x = [65, 6, 65, 7, 75, 8, 75, 7] T,ina3-Dsubspacespannedbythreeorthogonal basis vectors. This process can be continued by increasing the dimensionality from 3 to 8, so that the approximation error will be progressively reduced to reach zero, when eventually the signal vector is represented in the entire 8-D vector space. Consider the 8 orthogonal basis vectors shown below as the row vectors in this matrix (Walsh-Hadamard transform matrix): H w = (2.43) Note that the first three rows are used in the example. Now approximate the same signal by using 1 to all 8 rows as the basis vectors. Plot the original signal and the approximation in k-d subspaces for k =1, 2,, 8, adding one dimension at a time for more detailed variations in the signal. Find the coefficients c[k] and the error in each case.consider using some software tool such as Matlab. 11. The same temperature signal in Example 2.8 x =[65, 6, 65, 7, 75, 8, 75, 7] T can also be approximated using a set of different basis vectors obtained by sampling the following cosine functions: a (t) =1, a 1 (t) = 2cos(πt), a 2 (t) = 2cos(2πt) (2.44) at 8 equally points n k =1/16 + n/8 = n.125, (n =1, 2,, 8). The resulting vectors are actually used in the discrete cosine transform to be discussed later. Find the coefficients c[k] anderrorforeachapproximationin ak-dsubspace(k =1, 2,, 8), and plot the original signal together with the approximation for each case. Use a software tool such as Matlab. 12. Consider a frame in R 2 containing three vectors that form a frame matrix: [ ] 1 1 F =[f 1, f 2, f 3 ]= (2.45) 1 1 Find the eigenvalues of FF T and its inverse (FF T ) 1. Find the dual frame F =[ f 1, f 2, f 2 ]. Find the coefficient vectors c =[c[1],c[2],c[3]] and d =[d[1],d[2],d[3]] for representing x =[1, 2] T so that x = n c[n] f n = n d[n]f n (2.46)

119 2.7 Homework Problems 11 Verify that x can be indeed perfectly reconstructed. Verify Eqs.2.34 and Consider a frame in R 2 containing two vectors that form a frame matrix: [ ] 2 1 F =[f 1, f 2 ]= (2.47) 1 2 As f 1 and f 2 are linearly independent, they form a Riesz basis. Find the dual frame and verify they are biorthonormal as shown in Eq Given x =[2, 3] T,findthecoefficientvectorsc and d x = n c[n] f n = n d[n]f n (2.48) Verify that x can be indeed perfectly reconstructed. Verify Eqs.2.34 and Given a basis in R 3 : 1 1 f 1 =, f 2 = 1, f 3 = 1 (2.49) 1 Find its biorthogonal dual f 1, f 2, f 3,andtwosetsofcoefficientsc[k] and d[k] (k =1, 2, 3) to represent a vector x =[1, 2, 3] T. 1

120 3 Continuous-Time Fourier Transform 3.1 The Fourier Series Expansion of Periodic Signals Formulation of The Fourier Expansion As considered in the previous chapter, the second-order differential operator D 2 over the interval [,T]isaself-adjointoperator,anditseigenfunctionsφ k (t) = e j2kπt/t / T (k =, ±1, ±2, )areorthonormal(eq.2.183i.e.eq.2.13): <φ k (t),φ l (t) >= 1 e j2kπt/t e jlnπt/t dt = 1 e j2(f l)πt/t dt = δ[k l] T T T T (3.1) and they form a complete orthogonal system that spans a function space over interval [,T]. Any periodic signal x T (t) =x T (t + T )inthespacecanbe expressed as a linear combination of these basis functions: x T (t) = X[k]φ k (t) = 1 X[k]e j2kπt/t (3.2) T k= k= Here a periodic signal is denoted by x T (t) withasubscriptt for its period. However, this subscript may be dropped for simplicity when no confusion will be caused. Note that at the two boundary points t =andt = T the summation on the right hand side is always equal to k= X[k]/ T,i.e.,thecondition in Eq is guaranteed. Consequently at these points t = and t = T the reconstructed signal in Eq.3.2 may not be the same as the original signal x T (t) if x T () x T (T ). Due to the orthogonality of these basis functions, the lth coefficient X[l] can be found by taking an inner product with φ l (t) =e j2lπt/t / T on both sides of the equation above: <x T (t),φ l (t) > = <x T (t),e j2lπt/t / T>= 1 X[k] <e j2kπt/t,e j2lπt/t > T = k= k= X[k]δ[k n] =X[l] (3.3) 12

121 3.1 The Fourier Series Expansion of Periodic Signals 13 Figure 3.1 Fourier series expansion of periodic signals i.e., the kth coefficient X[k] istheprojectionoffunctionx T (t) ontothekthbasis function φ k (t): X[k] =<x T (t),φ k (t) >= 1 x T (t)e j2kπt/t dt (3.4) T Equations 3.2 and 3.4 is the Fourier series expansion which can also be written as the following pair: X[k] =F[x T (t)] = 1 x T (t)e j2kπt/t dt =< x T (t),e j2kπt/t / T>, T T (k =, ±1, ±2, ) x T (t) =F 1 [X[k]] = 1 T = k= k= T X[k]e j2kπt/t <x T (t),e j2kπt/t / T>e j2kπt/t / T (3.5) As the signal and the basis functions are both periodic, the integral above can be over any interval of T,suchas[,T]and[ T/2, t<t/2]. As defined in Eq.2.175, we have 1/T = f and 2π/T =2πf = ω,sothatthe basis function can also be written as φ k (t) =e j2kπft / T = e jkωt / T.Wewill use any of the equivalent expressions interchangeably, whichever most convenient in the specific discussion. Moreover, in practice, the constant scaling factor 1/ T in the equations above has little significance, therefore the Fourier series expansion pair could be expressed in some alternative forms such as: x T (t) = X[k]e j2kπft = X[k]e jkω t X[k] = 1 T k= T k= x T (t)e j2kπf t dt = 1 T T x T (t)e jkω t dt (3.6) In this form, X[] = T x T (t)dt/t has a clear interpretation, it is the average, offset, or the DC (direct current) component of the signal. The Fourier series expansion is a unitary transformation that converts a function x T (t) inthevectorspaceofallperiodictimefunctionsintoavector [,X[ 1],X[],X[1], ] T in another space of all vectors of infinite elements (or components). Also, the inner product of any two functions x T (t) andy T (t)

122 14 Continuous-Time Fourier Transform remains the same before and after the unitary transformation: <x T (t),y T (t) >= x T (t)y T (t)dt = 1 T = 1 T = X[k]e j2kπf t T k= l= k= l= k= l= T X[k]Y [l] T X[k]Y [l]δ[k l] = Y [l]e j2nπf t dt e j2(k l)πf t dt k= X[k]Y [k] =< X, Y > (3.7) In particular, if y T (t) =x T (t), the above becomes Parseval s identity x T (t) 2 =<x T (t),x T (t) >=< X, X >= X 2 (3.8) indicating that the total energy or information contained in the signal is conserved by the Fourier series expansion, therefore the signal can be equivalently represented in either time or frequency domain Physical Interpretation The Fourier series expansion of a periodic signal x T (t) canalsobeexpressedin terms of sine and cosine functions: x T (t) = X[k]e jkωt = X[] + [X[ k]e jkωt + X[k]e jkωt ] = X[] + = X[] + = X[] + 2 k= k=1 [X[ k](cos kω t j sin kω t)+x[k](cos kω t + j sin kω t)] k=1 [(X[k]+X[ k]) cos kω t + j(x[k] X[ k]) sin kω t] k=1 (a k cos kω t + b k sin kω t) (3.9) k=1 Here we have defined a k =(X[k]+X[ k])/2 andb k =(X[k] X[ k])/2, which can also be expressed as (Eq.3.6): a k = 1 x T (t)[e jkωt + e jkωt ]dt = 1 x T (t)coskω tdt 2T T T T b k = j x T (t)[e jkωt e jkωt ]dt = 1 x T (t)sinkω tdt 2T T T T (k =1, 2, ) (3.1) The two equations above are the alternative form of the Fourier series expansion of x T (t).

123 3.1 The Fourier Series Expansion of Periodic Signals 15 In particular, if x T (t) isreal,wehave X[ k] = 1 x T (t)e j2kπft dt = X[k] (3.11) T which means T Re[X[ k]] = Re[X[k]], Im[X[ k]] = Im[X[k]] (3.12) i.e., the real part of X[k] isevenandtheimaginarypartisodd.nowwehave: i.e., a k = X[k]+X[ k] = X[k]+X[k] = Re[X[k]] 2 2 j(x[k] X[ k]) j(x[k] X[k]) b k = = = Im[X[k]] (3.13) 2 2 { X[k] = a 2 k + b 2 k X[k] = tan 1 (b k /a k ) { ak = X[k] cos X[k] b k = X[k] sin X[k] (3.14) and the Fourier series expansion of a real signal x T (t) (Eq.3.9)canberewritten as: x T (t) =X[] + 2 (a k cos kω t + b k sin kω t) = X[] + 2 k=1 X[k] (cos X[k]coskω t sin X[k]sinkω t) k=1 = X[] + 2 X[k] cos(kω t + X[k]) k=1 (3.15) This is yet another form of the Fourier expansion, which indicates that a real periodic signal x T (t) canbeconstructedasasuperpositionofinfinitesinusoids of (a) different frequencies kω,(b)differentamplitudes X[k], and(c)different phases X[k]. In particular, consider the following values for k: k =,thecoefficientx[] = T x T (t)dt/t is the average or DC component of the signal x T (t); k =1,thesinusoidcos(ω t + X[1]) has the same period T as the signal x T (t) and is therefore called the fundamental frequency of the signal; k>1, the frequency of the sinusoidal function cos(kω t + X[k]) is k times the frequency of the fundamental and is called the kth harmonic of the signal Properties of The Fourier Series Expansion Here is a set of properties of the Fourier series expansion:

124 16 Continuous-Time Fourier Transform Linearity: F[ax(t)+by(t)] = a F[x(t)] + b F[y(t)] (3.16) As an integral operator which is by definition linear, the Fourier expansion is obviously linear. Time scaling: When xt (t) isscaledintimebyafactorofa>tobecome x(at), its period becomes T/a and its fundamental frequency becomes a/t = af.ifa>1, the signal is compressed by a factor a and the frequencies of its fundamental and harmonics become a times higher; if a < 1, the signal is expanded and the frequencies of its fundamental and harmonics are a times lower. In either case, the coefficients X[k] remainthesame: x(at) = X[k]e j2kaπft = X[k]e jkaω t (3.17) k= k= Time shift: Atimesignalx(t) shiftedintimebyτ becomes y(t) =x(t τ). Defining t = t τ we can get its Fourier coefficient as: Y [k] = 1 x(t τ)e jkωt dt = 1 x(t )e jkω (t +τ ) dt T T T = X[k]e jkω τ = X[k]e j2kπfτ (3.18) Differentiation: Fourier coefficients of the time derivative y(t) =dx(t)/dt can be found to be: Y [k] = 1 [ ] d T T dt x(t) e jkωt dt [ = 1 T ] e jkωt x(t) T + jkω x(t)e jkωt dt Integration: The time integration of x(t) is T y(t) = t T = jkω X[k] =jk 2π T X[k] (3.19) x(τ)dτ (3.2) Note that y(t) is periodic only if the DC component or average of x(t) is zero, i.e., X[] = (otherwise it would accumulate over time by the integration to form a ramp). Since x(t) = dy(t)/dt, according to the differentiation property above, we have X[k] =jk 2π T Y [k], i.e. Y [k] = X[k] (3.21) T j2kπ Note that Y [] can not be obtained from this formula as when k =,both the numerator and the denominator of Y [k] arezero.however,asthedc component of y(t), Y [] can be found by the definition: Y [] = 1 y(t)dt (3.22) T T

125 3.1 The Fourier Series Expansion of Periodic Signals 17 Parseval s theorem: 1 x T (t) 2 dt = T T k= X[k] 2 (3.23) This is already given in Eq.3.8. The left-hand side of the equation represents the average power in x T (t). The left-hand side can be written as 1 X[k]e j2πkft 2 dt = 1 X[k] 2 dt = X[k] 2 (3.24) T T T which represents the average power contained in the kth frequency component. Therefore Eq.3.23 states that the average power in one period of the signal is the sum of the average power of all of its frequency components, i.e., the power in the signal is conserved in either time or frequency domain. T The Fourier Expansion of Typical Functions Here we consider the Fourier expansion ofasetoftypicalperiodicsignals. Constant: Aconstantx(t) =1 can be expressedas a complex exponential x(t) =e jt with arbitrary period T,i.e.,itisazero-frequencysignal.TheFouriercoefficient for this zero frequency is X[] = 1, while all other coefficients for nonzero frequencies are zero. Alternatively, following the definition, we get (Eq.1.33): X[k] = 1 e jkωt dt = δ[k] (3.25) T T Complex exponential: Acomplexexponentialx(t) =e j2πft = e jωt (with period T =1/f =2π/ω ) with a coefficient X[1] = 1. We can also find X[k] bydefinition: c k = 1 T T e jω t e jkω t dt = 1 T T e jω (1 k)t dt = δ[k 1] = { 1 k = k (3.26) Sinusoids: The cosine function x(t) =cos(2πf t)=(e j2πft + e j2πft )/2 offrequencyf is periodic with T =1/f,anditsFouriercoefficientsare X[k] = 1 cos(2πf t)e j2πkft dt T T = 1 2 [ 1 e j2π(k 1)ft dt + 1 T T T T e j2π(k+1)f t dt] = 1 (δ[k 1] + δ[k +1]) (3.27) 2 In particular, when f =,x(t) =1andX[k] =δ[k], an impulse at zero, representing the constant (zero frequency) value. Similarly, the Fourier coefficient

126 18 Continuous-Time Fourier Transform of x(t) =sin(2πf t)is: X[k] = 1 sin(2πf t)e j2πkft dt T T = 1 [ 1 e j2π(k 1)ft dt 1 2j T T Alternatively, as T T ] e j2π(k+1)ft dt = 1 (δ[k 1] δ[k +1]) (3.28) 2j cos(2πf t)= 1 2 [ej2πf t + e j2πf t ]= k= X[k]e j2πf t (3.29) Comparing the two sides of the last equal sign we see that all X[k] =except X[1] = X[ 1] = 1/2, i.e., X[k] =(δ[k 1] + δ[k + 1])/2. Similarly, comparing the two sides of the following sin(2πf t)= 1 2j [ej2πf t e j2πft ]= X[k]e j2πf t (3.3) k= we see that all X[k] =exceptx[1] = 1/2j and X[ 1] = 1/2j, i.e.,x[k] = (δ[k 1] δ[k +1])/2j. ThismethodcanbeusedtofindtheFouriercoefficients of signals containing a small number of complex exponential terms. Square wave: Let x(t) beanoddsquarewave: x(t) = { 1 <t<τ τ<t<t The Fourier coefficients of this function are X[k] = 1 T T = e jkπf τ kπ x(t)e j2kπf t dt = 1 T τ (e jkπf τ e j2kπf τ ) 2j A sinc function is commonly defined as: sinc(x) = sin(πx) πx (3.31) e j2kπf t dt = 1 j2kπ (1 e j2kπf τ ) = e jkπf τ kπ sin(kπf τ) (3.32), and lim sinc(x) =1 (3.33) x and the expression above for X[k] canbefurtherwrittenas: X[k] =f τ sin(kπf τ) e jkπτf = τ kπf τ T sinc(kf τ)e jkπf τ In particular, the DC component is X[] = τ/t. Also if τ = T/2, then X[] = 1/2 andx[k] abovebecomes: (3.34) X[k] = 1 j2kπ (1 e jkπ )= e jkπ/2 sin(kπ/2) (3.35) kπ

127 3.1 The Fourier Series Expansion of Periodic Signals 19 Moreover, since e ±j2kπ =1ande ±j(2k 1)π = 1, all even terms X[±2k] = become zero and the odd terms become: X[±(2k 1)] = ±1/jπ(2k 1), (k =1, 2, ) (3.36) and the Fourier series expansion of the square wave becomes a linear combination of sinusoids: x(t) = X[k]e j2kπf t k= = X[] + [ k=1 1 jπ(2k 1) ej(2k 1)ω t + = sin((2k 1)ω t) π 2k 1 k=1 = [ sin(ω t) + sin(3ω t) π sin(5ω t) 5 ] 1 jπ(2k 1) e j(2k 1)ω t ] + (3.37) As the function x(t) is odd(except the DC),it is composed of only odd sine functions. Triangle wave: Atrianglewaveisdefinedasanevenfunction: x(t) =2 t /T, ( t T/2) (3.38) First, the DC offset X[] can be found from the definition: X[k] = 1 x(t)dt = T T 2 T (3.39) For k,werealizethatthistrianglewavecanbeobtainedasanintegralof the square wave defined in Eq with these modifications: (a) τ = T/2, (b) DC offset is zero X[] =, and (c) vertically scaled by 4/T.Nowaccordingto the integration property, the Fourier coefficients can be easily obtained from Eq.3.35 as X[k] = 4 T T j2kπ e jkπ/2 kπ sin(kπ/2) = 2 j sin(kπ/2) (kπ) 2 e jkπ/2 = 2sin(kπ/2) (kπ) 2 ( j) k+1 (3.4) This is a real and even X[k] =X[ k] withrespecttok (k =, ±1, ±2, ). According to the time shift property, the complex exponential e jkπ/2 corresponds to a right-shifted by T/4. If we shift the signal left by T/4, the triangle wave x(t) becomes odd,the complex exponential term in the expression of X[k] disappears: X[k] = 2 j sin(kπ/2) (kπ) 2 (3.41) This is imaginary and odd X[k] = X[ k] with respect to k (k =, ±1, ±2, ).

128 11 Continuous-Time Fourier Transform The Fourier series expansion of such an odd triangle wave can be written as below. As the function x(t) isodd(exceptthedc),itiscomposedofonly odd sine functions. x(t) = X[k]e j2kπft = [X[k]e j2kπft + X[ k]e j2kπft ] k= = ( 2 j k=1 = π 2 k=1 k=1 sin(kπ/2) (kπ) 2 e j2kπft 2 sin(kπ/2) j (kπ) 2 e j2kπft ) sin(kπ/2) k 2 sin(2kπf t) = π 2 [sin(2πf t) 1 9 sin(6πf t) sin(1πf t) ] (3.42) Sawtooth: Asawtoothfunctionisdefinedas x(t) =t/t, ( <t<t) (3.43) First find X[], the average or DC component: X[] = 1 t T T e jω t dt = 1 2 T (3.44) Next we find all remaining coefficients X[k] (k ): X[k] = 1 t T T e jkω t dt (3.45) T In general, this type of integrals can be found using integration by parts: te at dt = 1 a 2 (at 1)eat + C (3.46) Here a = jkω = j2kπ/t andweget X[k] = 1 T 2 (jkω ) 2 [( jkω t 1)e jkωt ] T = j 2kπ The Fourier series expansion of the function is x(t) = j [ 2kπ ejω t j 2kπ e jω t ]= π k=1 k=1 (3.47) 1 k sin(kω t) (3.48) Note that this sawtooth wave is an odd function and therefore it is composed of only odd sine functions. Some different versions of the square, triangle and sawtooth waveforms are shown in Fig.3.2. The corresponding Fourier series expansions of these waveforms are illustrated in Fig.3.3. The first ten basis functions for the DC component, fundamental frequency and progressively higher harmonics are shown on the left, and the reconstructions by inverse transform of the square, triangle and sawtooth waveforms are shown in the remaining three columns.

129 3.1 The Fourier Series Expansion of Periodic Signals 111 As we can see, the accuracy of the reconstruction of a waveform improves continuously as more basis functions of higher frequencies are included in the reconstruction so that finer details (corresponding to rapid changes in time) can be better represented. Figure 3.2 Square wave (top), triangle wave (middle) and sawtooth wave (bottom) Impulse Train: An impulse train, also called a comb function or sampling function, is a sequence of infinite unit impulse separated by a time interval T : comb(t) = δ(t nt ) (3.49) n= As a function with period T,thisimpulsetraincanbeFourierexpanded: comb(t) = Comb[k] e j2kπt/t (3.5) with coefficients: Comb[k] = 1 T = 1 T T/2 T/2 T/2 T/2 k= comb(t)e j2kπt/t dt = 1 T T/2 T/2 n= δ(t nt )e j2kπt/t dt δ(t)e j2kπt/t dt = 1, (k =, ±1, ±2, ) (3.51) T The last equation is due to Eq Substituting Comb[k] = 1/T back into the Fourier series expansion of comb(t), we can also express the impulse train

130 112 Continuous-Time Fourier Transform Figure 3.3 Fourier reconstructions of square, triangle, and sawtooth waveforms (2nd, 3rd and 4th columns) with progressively more higher harmonics included as: comb(t) = n= This is actually the same as Eq δ(t nt )= 1 T k= e j2kπt/t (3.52) Fig.3.4 shows a set of periodic signals (left) and their corresponding Fourier coefficients (right). To carry out the Fourier series expansion of a given signal function x(t), it is necessary to first find its fundamental frequency f or equivalently its period T =1/f,whichsometimeisnotexplicitlyavailableandthereforeneedstobe found. Example 3.1: Consider a signal function x(t) =cos(2π4t)+cos(2π6t)containing two sinusoids of frequencies f 1 =4andf 2 =6orperiodsT 1 =1/f 1 =1/4 and T 2 =1/f 2 =1/6, respectively. The fundamental frequency f of the sum of these

131 3.1 The Fourier Series Expansion of Periodic Signals 113 Figure 3.4 Examples of Fourier series expansions Asetofperiodicsignals(left)andtheirFourierexpansioncoefficients(right)asa function of frequency f (real and imaginary parts are shown in solid and dashed lines, respectively). The first three rows show two sinusoids x 1 (t) =sin(2π3t) and x 2 (t) =cos(2π1t), and their weighted sum x 1 (t)+x 2 (t)/5. The following four rows are for the impulse train, square wave, triangle wave, and sawtooth wave, respectively. component sinusoids can be found as the greatest common divisor (GCD) of the individual frequency components: f = GCD(f 1,f 2 )=GCD(4, 6) = 2 (3.53) Or, equivalently, the period T of the sum can be found as the least common multiple (LCM) of the periods of the individual components: T = LCM(T 1,T 2 )=LCM(1/4, 1/6) = 1/2 (3.54)

132 114 Continuous-Time Fourier Transform Now the signal can be expressed in terms of its fundamental frequency as x(t) = cos(2π2f t)+cos(2π3f t)anditsfourierseriescoefficientscanbefoundtobe X[k] =(δ(k 2) + δ[k +2]+δ[k 3] + δ[k +3])/ The Fourier Transform of Non-Periodic Signals Formulation The Fourier series expansion does not apply to any non-periodic signal. To process and analyze such signals in frequency domain, the concept of the Fourier series expansion needs to be generalized. To do so, we first make some minor modification of the Fourier series expansion pair in Eq. 3.5 by moving the factor 1/T from the second equation to the first one: 1 x T (t) = T X[k]ejkω t 1 = T X[k]ej2kπf t k= k= X[k] = x T (t)e jkωt dt = x T (t)e j2kπft dt (3.55) T Here the coefficient X[k] isredefinedsothatitsvalueisscaledbyt,andits dimension becomes that of the signal x T (t) multipliedbytime,ordividedby frequency (the exponential term exp(±j2πf t)isdimensionless). Next we convert a periodic signal x T (t) intoanon-periodicsignalx(t) simply by increasing its period T to approach infinity T.Atthelimitthefollowing changes take place: The gap between two consecutive frequency components approaches zero f = 1/T, and the discrete frequencies kf for all integers <k< can be replaced by a continuous variable <f< ; The discrete and periodic basis functions φk (t) =e j2kπft for all k become uncountable and non-periodic φ f (t) =e j2πft for all f, asanorthogonalbasis that spans the function space over (, ) (Eq.1.28): <φ f (t),φ f (t) >= T e j2π(f f )t dt = δ(f f ) (3.56) The coefficients X[k] forthekthbasisfunction,orthekthfrequencycomponent, φ k (t) =e j2kπf t for all k is replaced by a continuous weight function X(f) forthecontinuousanduncountablebasisfunctionφ f (t) =e j2πft for all f; Let f = f =1/T,then1/T = f df when T,andthesummation in the first equation in Eq becomes an integral.

133 3.2 The Fourier Transform of Non-Periodic Signals 115 Figure 3.5 Fourier transform of non-periodic and continuous signals When the time signal is no longer periodic, its discrete spectrum represented by the Fourier series coefficients becomes a continuous function. Due to the changes above, when T,thetwoequationsinEq.3.55become: [ ] 1 x(t) = lim X[k]e j2kπf t = X(f) j2πft df T T k= [ ] X(f) = lim x(t)e j2kπft dt = x(t)e j2πft dt (3.57) T T These two equations can be rewritten as thecontinuous-time Fourier transform (CTFT) pair: X(f) =F[x(t)] = x(t) =F 1 [X(f)] = x(t)e j2πft dt X(f)e j2πft df (3.58) The first and second equations are the forward and inverse CTFT, respectively, which can be more concisely represented as: x(t) F X(f) (3.59) The weighting function X(f) in Eq.3.58 is called the Fourier spectrum of x(t), representing how the signal energy is distributed over frequency, in comparison with x(t) representinghowthesignalenergyisdistributedovertime.a non-periodic signal and its continuous spectrum are illustrated in Fig.3.5, in comparison to a periodic signal and its discrete spectrum shown in Fig.3.1. Eq.3.58 can be considered as the most generic form of the forward and inverse Fourier transform pair, generally denoted by F[. ]andf 1 [. ], with different variations depending on the specific nature of the signal x(t), such as whether it is periodic or aperiodic, continuous or discrete (to be considered in the next chapter). For example, the Fourier series expansion in Eq.3.5 is just a special case of Eq.3.58, where the Fourier transform is applied to a periodic signal x(t) = x T (t), and the Fourier coefficients X[k] arejustthediscretespectrumx(f) = F[x T [t)] of the periodic signal, as shown in the following subsection (Eq.3.78). Comparing Eq.3.58 with Eqs and 2.135, we see that the CTFT is actually the representation of a signal function x(t) by an uncountably infinite set of orthonormal basis functions (Eq.2.133) defined as: φ f (t) =e j2πft, ( <f< ) (3.6)

134 116 Continuous-Time Fourier Transform so that the function x(t) canbeexpressedasalinearcombination,anintegral, of these basis functions φ f (t) overallfrequenciesf: x(t) = X(f)φ f (t)df = X(f)e j2πft df (3.61) This is the second equation in Eq.3.58, and the coefficient function X(f) canbe found as the projection of the signal function x(t) ontothebasisfunctionφ f (t): X(f) =<x(t),φ f (t) >= 1 T T x T (t)e j2πkt/t (3.62) This is the forward CTFT in Eq The Fourier transform pair in Eq.3.58 can also be equivalently represented in terms of the angular frequency ω =2πf: X(ω) =F[x(t)] = x(t) =F 1 [X(ω)] = 1 2π x(t)e jωt dt X(ω)e jωt dω (3.63) In some literatures, the CTFT spectrum X(f)orX(ω)isalsodenotedbyX(jω), as it takes this form when treated as a special case of the Laplace transform, to be considered in Chapter??. However,allthesedifferent forms are just some notational variations of the same spectrum, a function of frequency f or angular frequency ω =2πf. Wewillusethesenotationsinterchangeably,whichever most convenient and suitable in the specific discussion, as no confusion should be caused given the context. Moreover, wealsonotethatwhenthespectrum is denoted by X(f), the Fourier transform pair in Eq.3.58 appears symmetric between time and frequency domains so that the time-frequency duality is more clearly revealed. In order for the integral in Eq.3.58 to converge, i.e., for X(f) toexist,the signal x(t) needstosatisfythefollowingdirichletconditions: 1. x(t) is absolutely integrable: x(t) dt < (3.64) 2. x(t) hasfinitenumberofmaximaandminimawithinanyfiniteinterval; 3. x(t) hasfinitenumberofdiscontinuitieswithinanyfiniteinterval. Alternatively, a more strict condition for the convergence of the integral is that x(t) is an energy signal x(t) L 2 (R), i.e., it is square-integrable (Eq. 2.29). As some obvious examples, signals such as x(t) =t and x(t) =t 2 grow without bound as t and therefore their Fourier spectra do not exist. However, we note that the Dirichlet conditions are sufficient but not necessary, as there also exist some signals that do not satisfy such conditions but their Fourier spectra may still exist. For example, some important and commonly used signals such x(t) =1andx(t) =u(t) areneithersquareintegrablenorabsolutelyintegrable,

135 3.2 The Fourier Transform of Non-Periodic Signals 117 but their Fourier spectra can still be obtained, due to the introduction of the Dirac delta, a non-conventional function containing a value of infinity. The integrals of these functions can be considered to be marginally convergent. Similar to the Fourier series expansion, the Fourier transform is also a unitary transformation Fx(t) = X(f) thatconservesinnerproduct(theorem2.6): <x(t),y(t) > = = = = = x(t)y(t)dt [ ][ ] X(f)e j2πft df Y (f )e j2πf t df dt [ ] X(f)Y (f ) e j2π(f f )t dt df df X(f) Y (f )δ(f f )df df X(f)Y (f)df =< X(f),Y(f) > (3.65) Replacing y(t) by x(t) in Eq.3.65 above,we get Parseval s identity: x(t) 2 =<x(t),x(t) >=< X(f),X(f) >= X(f) 2 (3.66) As a unitary transformation, the Fourier transform can be considered as a rotation of the basis functions of the function space. Before the transform, the function is represented as a linear combination of a uncountable set of standard basis functions δ(t τ) eachforaparticulartimemomentt = τ, weightedby the coefficient function x(τ) forthesignalamplitudeatthetimemoment: x(t) = x(τ)δ(t τ)dτ (3.67) After the transformation, the function is represented as a linear combination of adifferentsetoforthonormalbasisfunctions,arotatedversionofthestandard basis F 1 [δ(t τ)] = e j2πfτ,weightedbythespectrumx(f) foreachfrequency component: x(t) = X(f)e j2πft df (3.68) The representations of the signal as a function x(t) in time domain and a spectrum X(f)infrequencydomainareequivalent,inthesensethatthetotalamount of energy or information is conserved due to the Parseval s identity. However, how the total energy is distributed through time t or frequency f can be very different, which may be one of the reasons why the Fourier transform is carried out to start with. Example 3.2: Here we consider the Fourier transform of a few special signals:

136 118 Continuous-Time Fourier Transform The unit impulse or Dirac delta: The constant function: F[δ(t)] = F[1] = δ(t)e j2πft = e j2πf =1 (3.69) The second equal sign is due to Eq The unit step defined as: t< u(t) = 1/2 t = 1 t> Its Fourier transform is (Eq.1.3): F[u(t)] = u(t)e j2πft dt = Similarly, we also have (Eq.1.31): F[u( t)] = e j2πft = δ(f) (3.7) e j2πft dt = 1 2 δ(f)+ 1 j2πf e j2πft = 1 2 δ(f) 1 j2πf (3.71) (3.72) (3.73) Note that the term δ(f)/2 isforthedccomponentoftheunitstep.these results can be verified based on the fact that u( t) =1 u(t): F[u( t)] = F[1] F[u(t)] = δ(f) 1 2 δ(f) 1 j2πf = 1 2 δ(f) 1 j2πf (3.74) The sign function x(t) =sgn(t) definedas: 1 t< sgn(t) =2u(t) 1= t = (3.75) 1 t> Due to linearity of the Fourier transform, its spectrum can be found to be: F[sgn(t)] = 2 F[u(t)] F[1] = δ(f)+ 1 jπf δ(f) = 1 jπf (3.76) Note that the term δ(f)/2 disappearsasthesignfunctionhaszerodccomponent Relation to The Fourier Expansion Now let us consider how the Fourier spectrum of a periodic function is related to its Fourier expansion coefficients. The Fourier expansion of a periodic function

137 3.2 The Fourier Transform of Non-Periodic Signals 119 x T (t) is: x T (t) = k= X[k]e j2kπt/t = k= X[k]e j2kπf t (3.77) where f =1/T is the fundamental frequency and X[k] theexpansioncoefficient. The Fourier transform of this periodic function x T (t) canbefoundtobe: [ ] X(f) = x T (t)e j2πft dt = X[k]e j2kπf t e j2πft dt = k= k= X[k] e j2π(f kf)t dt = k= X[k]δ(f kf ) (3.78) Here we have used the result of Eq It is clear that the spectrum of a periodic function is discrete, in the sense that it is none-zero only at a set of discrete frequencies f = kf where X(f) =X[k]δ(f kf ). This result also illustrates an important point: while the dimension of the Fourier coefficient X[k] is the same as that of the signal x T (t), i.e., [X[k]] = [x T (t)], the dimension of the spectrum is [X(f)] = [X[k]][t] = [X[k]] [f] (3.79) As the dimension of X(f) isthatofthesignalx(t) multipliedbytime,ordivided by frequency, X(f) isactuallyafrequency density function. In the future we will loosely use the term spectrum for not only a continuous function X(f) offrequencyf, butalsothediscretetransformcoefficientsx[k] as they can always be associated with a continuous function as in Eq.3.78 Next, we consider how the Fourierspectrum X(t)ofasignalx(t)canberelated to the Fourier series coefficients of its periodic extension defined as: x (t) = x(t + nt )=x (t + T ) (3.8) n= As x (t + T )=x (t) isperiodic,itcanbefourierexpandedandthekthfourier coefficient is: X [k] = 1 T x (t)e j2πkt/t = 1 [ T ] x(t + nt ) e j2πkt/t dt T T = 1 T n= T n= x(t + nt )]e j2πkt/t dt (3.81) If we define τ = t + nt,i.e.,t = τ nt,theabovebecomes: X [k] = 1 (n+1)t x(τ)e j2πkτ/t dτ e j2πnk T = 1 T n= nt x(τ)e j2πkτ/t dτ = 1 T X( k T ) (3.82)

138 12 Continuous-Time Fourier Transform (Note that e j2πnk =1ask and n are both integer.) This equation relates the Fourier transform X(f) ofasignalx(t) tothefourierseriescoefficientx [k] of the periodic extension x (t) ofthesignal.nowthefourierexpansionofx (t) can be written as: x (t) = X [k]e j2πkt/t = 1 X( k T T )ej2πkt/t (3.83) k= k= This equation is actually a special case of the Poisson summation formula given in Eq when X(f) =F[δ(t)] = Properties of The Fourier Transform Here we consider a set of properties of the Fourier transform, many of which should look similar to those of the Fourier series expansion discussed before, simply because the Fourier expansion is just a special case (for periodic signals) of the Fourier transform, it naturally shares all of the properties. In the following, we always assume x(t) andy(t) aretwocomplexfunctions(realasaspecialcase) and F[x(t)] = X(f) andf[y(t)] = Y (f). Linearity: F[ax(t)+by(t)] = af[x(t)] + bf[y(t)] (3.84) The Fourier transform of a function x(t) issimplyaninnerproductofthe function with a kernel function φ f (t) =e j2πft (Eq.3.62). Due to the linearity of the inner product in the first variable, the Fourier transform is also linear. Time-frequency duality: Proof: Defining t = t, wehave if F[x(t)] = X(f), then F[X(t)] = x( f) (3.85) x(t) =F 1 [X(f)] = x( t )= Interchanging variables t and f, weget x( f) = In particular, if x(t) = x( t) is even, we have X(f)e j2πft df (3.86) X(f)e j2πft df (3.87) X(t )e j2πft dt = F[X(t)] (3.88) if F[x(t)] = X(f), then F[X(t)] = x(f) (3.89) This duality is simply the result of the definition of the forward and inverse transforms in Eq. 3.58, which are highly symmetric between time and fre-

139 3.2 The Fourier Transform of Non-Periodic Signals 121 quency. Consequently, many of the properties and transforms of typical functions have strong duality between the time and frequency domains. Even and odd signals: If the signal is even, then its spectrum is also even: proof: if x(t) =x( t), then X(f) =X( f) (3.9) X(f) = = x(t)e j2πft dt = x( t)e j2πft dt x(t )e j2πft dt = X( f) (3.91) where we have assumed t = t. If the signal is odd, then its spectrum is also odd: if x(t) = x( t), then X(f) = X( f) (3.92) The proof is similar to the above. Time reversal: F[x( t)] = X( f) (3.93) i.e., if the signal x(t) isflippedintimewithrespecttotheorigint =, its spectrum X(f) isalsoflippedinfrequencywithrespecttotheoriginf =, Proof: F[x( t)] = x( t)e j2πft dt = x(t )e j2πft dt = X( f) (3.94) where we have assumed t = t. Inparticular,whenx(t) =x(t) isreal, F[x( t)] = X( f) = x(t)e j2πft = Multiplication (Plancherel) theorem: <x(t),y(t) >= x(t)y(t)dt = x(t)e j2πft = X(f) (3.95) X(f)Y (f)df =< X(f),Y(f) > (3.96) This is Eq. 3.65, indicating that the Fourier transform is a unitary transformation that conserves inner product. In particular, letting y(t) = x(t), we get Parseval s identity representing signal energy conservation by the Fourier transform: x(t) 2 = x(t) 2 dt = X(f) 2 df = S x (f)df = X(f) 2 (3.97) where x(t) 2 and S x (f) = X(f) 2 are respectively the signal energy distributions over time and frequency, and S x (f) isdefinedasthepower density spectrum (PDS) of the signal.

140 122 Continuous-Time Fourier Transform Time and frequency scaling: F[x(at)] = 1 a X(f a ) (3.98) Proof: First we assume a positive scaling factor a> and get: F[x(at)] = x(at)e j2πft dt = x(u)e j2πfu/a d( u a )=1 a X(f a ) (3.99) where we have assumed u = at. Applyingthetime-reversalpropertytothis result we get: F[x( at)] = 1 a X( f a ) (3.1) Letting a = a <, we get the following for a negative scaling factor: F[x(a t)] = 1 a X( f a ) (3.11) Combining the above results for both positive and negative scaling factors, we get Eq If a < 1, the signal is stretched and its spectrum is compressed and scaled up. When a, x(at) is so stretched that it approaches a constant, and its spectrum is compressed and scaled up to the extent that it approaches an impulse. On the other hand, if a > 1, then the signal is compressed and its spectrum is stretched and scaled down. When a,weredefinethe signal as ax(at) with spectrum X(f/a), the signal becomes an impulse and its spectrum X(f/a)becomes a constant. Time and frequency shift: Proof: We first prove Eq.3.12: F[x(t ± t )] = e ±j2πft X(f) (3.12) F 1 [X(f ± f )] = e j2πf t x(t) (3.13) F[x(t ± t )] = x(t ± t )e j2πft dt (3.14) Let t = t ± t,thent = t t, dt = dt, theabovebecomes F[x(t ± t )] = x(t )e j2πf(t t ) dt = e ±j2πft X(f) (3.15) We see that a time shift t of the signal corresponds to a phase shift 2πft for every frequency component e j2πft.thisresultcanbeintuitivelyunderstood. As the phase shift is proportional to the frequency, a higher frequency component will have a greater phase shift while a lower frequency component will have a smaller phase shift, so that the relative positions of all harmonics remain the same, and the shape of the signal as a superposition of these harmonics remains the same when shifted.

141 3.2 The Fourier Transform of Non-Periodic Signals 123 Applying the time-frequency duality to the time shift property in Eq.3.12, we get the frequency shift property in Eq Correlation: The cross-correlation between two functions x(t) andy(t) isdefinedas Its Fourier transform is: r xy (t) =x(t) y(t) = x(τ)y(τ t)dτ (3.16) F[r xy (t)] = F[x(t) y(t)] = X(f)Y (f) (3.17) Proof: As F[x(τ)] = X(f) andf[y(τ t)] = Y (f)e j2πft,wecaneasilyprovethe property by applying the multiplication theorem: r xy (t) = = x(τ)y(τ t)dτ = X(f)Y (f)e j2πft df S xy (f)e j2πfτ df = F 1 [S xy (f)] (3.18) where S xy (f) isthecross power density spectrum S xy (f) ofthetwosignals defined as: S xy (f) =X(f)Y (f) =F[r xy (t)] (3.19) If both signals x(t) =x(t) andy(t) =y(t) arereal,i.e.,x(f) =X( f) and Y (f) =Y ( f), then we have S xy (f) =X(f)Y ( f). In particular, when x(t) = y(t), we have: r x (t) = x(τ)x(τ t)dτ = S x e j2πfτ df = F 1 [S x (f)] (3.11) where r x (t) istheauto-correlation and S x (f) =X(f)X(f) = X(f) 2 is the power density spectrum of x(t). Convolution theorem: As first defined by Eq.1.86 in Chapter 1, the convolution of two functions x(t) and y(t) is: z(t) =x(t) y(t) = x(τ)y(t τ)dτ = y(τ)x(t τ)dτ = y(t) x(t) (3.111) If y(t) =y( t) iseven, thenx(t) y(t) =x(t) y(t) isthesameasthecorrelation. The convolution theorem states: F[x(t) y(t)] = X(f) Y (f) (3.112) F[x(t)y(t)] = X(f) Y (f) (3.113)

142 124 Continuous-Time Fourier Transform Proof: = = F[x(t) y(t)] = [ x(τ)y(t τ)dτ]e j2πft dt x(τ)e j2πfτ y(t τ)e j2πf(t τ ) dt dτ Similarly, we can also prove: x(τ)e j2πfτ Y (f)dτ = X(f)Y (f) (3.114) F[x(t)y(t)] = X(f) Y (f) (3.115) In particular, as shown in Eq.1.85 in Chapter 1, the output y(t) of an LTI system can be found as the convolution y(t) =h(t) x(t) ofitsimpulseresponse h(t) and the input x(t). Now according to the convolution theorem, the output of the system can be more conveniently obtained in frequency domain by amultiplication: Y (f) =H(f)X(f) (3.116) where X(f) and Y (f) are respectively the spectra of the input x(t) and the output y(t), and H(f) = F[h(t)], the Fourier transform of the impulse response function h(t), is the frequency response function (FRF) of the system, first defined by Eq.1.91 in Chapter 1. Time derivative: [ ] d F dt x(t) = j2πfx(f)=jωx(ω) (3.117) Proof: d dt x(t) = d dt = Repeating this process we get: X(f)e j2πft df = X(f) d dt ej2πft df j2πfx(f)e j2πft df = F 1 [j2πfx(f)] (3.118) F [ dn dt n x(t)] = (j2πf)n X(f) (3.119) Frequency derivative: The proof is very similar to the above. Time integration: F [tx(t)] = j d df X(f) F[t n x(t)] = j n 1 d n (2π) n X(f) (3.12) df n

143 3.2 The Fourier Transform of Non-Periodic Signals 125 The Fourier transform of a time integration is: F[ t x(τ)dτ] = 1 j2πf X(f)+ 1 X()δ(f) (3.121) 2 Proof: The integral of a signal x(t) canbeconsideredasitsconvolutionwithu(t): x(t) u(t) = x(τ)u(t τ)dτ = t x(τ)dτ (3.122) Due to the convolution theorem, we have: [ t ] 1 F x(τ)dτ = F[x(t) u(t)] = X(f)[ j2πf δ(f)] = 1 X() X(f)+ δ(f) (3.123) j2πf 2 Comparing Eqs and 3.121, we see that the time derivative and integral are the inverse operations of each other in frequency domain as well as in time domain. However, the second term in Eq is necessary for representing the DC component X() in signal x(t), while Eq does not have a corresponding term as derivative operation is insensitive to DC component in the signal. Complex conjugate: F[x(t)] = X( f) (3.124) Proof: Taking the complex conjugate of the inverse Fourier transform, we get: x(t) = = X(f)e j2πft df = X(f)e j2πft df X( f )e j2πf t df = F 1 [X( f)] (3.125) (3.126) where we have defined f = f. Real and imaginary signals: If x(t) isreal,thentherealpartx r (f) ofitsspectrumisevenandthe imaginary part X j (f) isodd. X r (f) =X r ( f), and X j (f) = X j ( f) (3.127) Proof: As x(t) = x(t) is real, i.e., F[x(t)] = F[x(t)], from Eq we get: X(f) =X( f), i.e., X r (f)+jx j (f) =X r ( f) jx j ( f) (3.128) Equating the real and imaginary parts on both sides we get Eq Moreover, when the real signal is either even or odd, we have the following results based on Eqs.3.9 and 3.92:

144 126 Continuous-Time Fourier Transform Table 3.1. Symmetry Properties of Fourier Transform x(t) =x r (t)+jx i (t) x(t) =x r (t) real x r (t) =x r ( t) real,even x r (t) = x r ( t) real,odd x(t) =x j (t) imaginary x j (t) =x j ( t) imaginary,even x j (t) = x j ( t) imaginary,odd X(f) =X r (f)+jx j (f) X r (f) =X r ( f) even, X j (f) = X j ( f) odd X r (f) =X r ( f) real,even X j (f) = X j (f) = X j (f) imaginary,odd X r (f) = X r (f) = X r ( f) odd, X j (f) =X j ( f) even X j (f) =X j ( f) imaginary,even X r (f) = X r (f) = X r ( f) real,odd X j (f) = * If x(t) =x( t) is even, then X(f) is also even, i.e. X j (f) = and X(f) =X r (f) =X r ( f) isrealandeven. * If x(t) = x( t) is odd, then X(f) isalsoodd,i.e.,x r (f) = and X(f) =X j (f) = X j ( f) isimaginaryandodd. If x(t) isimaginary,thentherealpartx r (f) ofitsspectrumisoddand the imaginary part X j (f) iseven: X r (f) = X r ( f), and X j (f) =X j ( f) (3.129) Proof: As x(t) = x(t) is imaginary, i.e., F[x(t)] = F[x(t)], from Eq we get: X(f) =X( f), i.e., X r (f)+jx j (f) = X r ( f)+jx j ( f) (3.13) Equating the real and imaginary parts on both sides we get Eq Moreover, when the imaginary signal is either even or odd, we have the following results based on Eqs.3.9 and 3.92: * If x(t) =x( t) is even, then X(f) is also even, i.e., X r (f) = and X(f) =jx j (f) =jx j ( f) isimaginaryandeven. * If x(t) = x( t) is odd, then X(f) isalsoodd,i.e.,x j (f) = and X(f) =X r (f) = X( f) isrealandodd. These results are summarized in Table 3.1. The complex spectrum X(f) ofatimesignalx(t) canbeexpressedineither Cartesian form in terms of the real and imaginary parts X r (f) andx j (f), or in polar form in terms of the magnitude X(f) and phase X(f): X(f) =X r (f)+jx j (f) = X(f) e j X(f) (3.131)

145 3.2 The Fourier Transform of Non-Periodic Signals 127 where { X(f) = Xr 2(f)+X2 j (f) X(f) =tan 1 [X j (f)/x r (f)], { Xr (f) = X(f) cos X(f) X j (f) = X(f) sin X(f) (3.132) We see that when the signal is either real or imaginary, X(f) is always even and X(f) isalwaysodd. Physical interpretation: The spectrum of a signal x(t) canbeexpressedas: x(t) = = X(f)e j2πft df = X(f) cos(2πft + X(f) e j2πft+ X(f) df X(f))df + j X(f) sin(2πft + X(f))df (3.133) If x(t) isreal(asmostsignalsinpractice), the second term is zero while the first term (an integral of an even function of f) remains,andwehave: x(t) =2 X(f) cos(2πft + X(f))df (3.134) We see that the Fourier transform expresses a real time signal as a superposition of infinitely many uncountable frequency components each with a different frequency f, magnitude X(f), andphase X(f). Note that Eq.3.15 for periodical signals is just the discrete version of the equation above Fourier Spectra of Typical Functions Unit impulse: The Fourier transform of the unit impulse function is given in Eq.3.69 according to the definition of the Fourier transform: F[δ(t)] = δ(t)e j2πft dt =1 (3.135) Sign function: The Fourier transform of the sign function sgn(t) isgivenineq.3.76: F[sgn(t)] = 1 jπf (3.136) Note that sign(t) isrealandodd,anditsspectrumisimaginaryandodd. Moreover, based on the time-frequency duality property, we also get: [ ] 1 F = jπ sgn(f) (3.137) t Unit step functions:

146 128 Continuous-Time Fourier Transform As the unit step is the time integral of the unit impulse: u(t) = t δ(t)dt (3.138) and F[δ(t)] = 1, F[u(t)] can be found according the time integration property (Eq.3.121) to be: F[u(t)] = 1 j2πf + 1 δ(f) (3.139) 2 which is the same as in Eq Moreover, due to the time reversalpropertyf[x( t)] = X( f), we can also get the Fourier transform of a left-sided unit step: F[u( t)] = 1 2 δ( f)+ 1 j2πf = 1 2 δ(f) 1 j2πf (3.14) (as δ( f) =δ(f).) Constant: As a constant time function x(t) = 1isnotsquare-integrable,theintegralof its Fourier transform does not converge: F[1] = e j2πft dt (3.141) However, we realize that the constant time function is simply the sum of a right-sided unit step and a left-sided unit step: x(t) =1=u(t) +u( t), and according to the linearity of the Fourier transform we have: F[1] = F[u(t)] + F[u( t)] = 1 j2πf δ(f) 1 j2πf + 1 δ(f) =δ(f) 2 (3.142) Alternatively, the Fourier transform of constant 1 can also be obtained according to the property of time-frequency duality, based on the Fourier transform of the unit impulse: F[1] = e j2πft dt = δ(f) (3.143) Due to the property of time-frequency scaling, if the time function x(t) is scaled by a factor of 1/2π to become x(t/2π), its spectrum X(f) willbecome 2πX(2πf) =2πX(ω). Specifically in this case, if we scale the constant 1 as atimefunctionby1/2π (still the same), its spectrum X(f) =δ(f) canbe expressed as a function of angular frequency X(ω) =2πδ(ω). Complex exponentials and sinusoids: The Fourier transform of a complex exponential x(t) =e jωt = e j2πft of frequency f is: F[e j2πf t ]= e j2π(f f )t dt = δ(f f ) (3.144)

147 3.2 The Fourier Transform of Non-Periodic Signals 129 and according to Euler s formula, the Fourier transform of cosine function x(t) =cos(2πf t)is: F[cos(2πf t)] = 1 2 [δ(f f )+δ(f + f )] (3.145) and similarly the Fourier transform of x(t) =sin(2πf t)is: F[sin(2πf t)] = 1 2j [δ(f f ) δ(f + f )] (3.146) Note that the sine and cosine functions are respectively odd and even, and so are their Fourier spectra. Also note that none of the step, constant, complex exponential and sinusoidal functions considered above is square-integrable, and correspondingly their Fourier transform integrals are only marginally convergent, in the sense that their transform functions X(f) allcontainadelta function (δ(f), δ(f f ), etc.) with an infinite value at certain frequency. Exponential functions: Aright-sidedexponentialdecayfunctionisdefinedase at u(t) (a>), and its Fourier transform can be found to be: F[e at u(t)] = e at e j2πft 1 dt = (a + j2πf) e (a+j2πf)t As lim a e at u(t) u(t), we have [ = F[u(t)] = lim a F[e at u(t)] = lim a 1 a + j2πf = 1 a + jω = a jω a 2 + ω 2 (3.147) 1 a + j2πf ] = 1 2 δ(f)+ 1 j2πf (3.148) which is the same as in Eq Note that it is tempting to assume at the limit a =, the second term alone will result, while in fact the firstterm δ(f)/2 is also necessary. The proof of this result is left to the reader as a homework problem. Next consider a left-sided exponential decay function e at u( t), the timereversal version of the right-sided decay function. According time reversal property F[x( t)] = X( f), we get: F[e at u( t)] = 1 a j2πf = 1 a jω (3.149) Finally, a two-sided exponential decay e a t is the sum of the right-sided and left-sided decay functions and according tothelinearityproperty,itsfourier transform can be obtained as: F[e a t ]=F[e at u(t)] + F[e at 1 u( t)] = a + j2πf + 1 a j2πf 2a = a 2 +(2πf) 2 = 2a a 2 + ω 2 (3.15) Rectangular function and sinc function:

148 13 Continuous-Time Fourier Transform Arectangularfunction,alsocalledasquareimpulse,ofwidthτ is defined as { 1 < t <τ/2 rect τ (t) = (3.151) else which can be considered as the difference between two unit step functions: rect(t) =u(t + τ/2) u(t τ/2) (3.152) Due to the properties of linearity and time shift, the spectrum of rect τ (t) can be found to be F[rect(t)] = F[u(t + τ/2)] F[u(t τ/2)] = ejπfτ j2πf e jπfτ j2πf = τ sin(πfτ) =τsinc(fτ) (3.153) πfτ This spectrum is zero at f = k/τ for any integer k. Ifweletthewidthτ, the rectangular function becomes a constant 1 and its spectrum an impulse function. If we divide both sides of the equation above by τ and let τ, the time function becomes an impulse and its spectrum a constant. As both the rectangular function and sinc function are symmetric, the timefrequency duality property applies, i.e., the Fourier spectrum of a sinc function in time domain is a rectangular function in frequency domain, called an ideal low-pass filter: { 1 f <fc H lp (f) = (3.154) f >f c where f c is called the cutoff frequency, thenaccordingtotime-frequencyduality, its time impulse response is: h lp (t) = sin(2πf ct) πt =2f c sinc(2f c t) (3.155) Note that the impulse response h lp (t) isnonzerofort<, indicating that the ideal low-pass filter is not causal (response before the input δ() at t =).In other words, an ideal low-pass filter is impossible to implement in real-time, but it can be trivially realized off-line in frequency domain. Triangle function: triangle(t) = { 1 t /τ t <τ t τ (3.156)

149 3.2 The Fourier Transform of Non-Periodic Signals 131 Following the definition, the spectrum of the triangle function, as an even function, can be obtained as: τ F[triangle(t)] = 2 (1 t/τ)cos(2πft)dt [ τ =2 cos(2πft)dt 1 τ ] t cos(2πft)dt τ = 1 [ sin(2πfτ) t πf τ sin(2πft) τ + 1 τ ] sin(2πft)dt τ = 1 2τ(πf) 2 cos(2πft) τ 1 = (1 cos(2πfτ)) 2τ(πf) 2 = τ sin2 (πfτ) (πfτ) 2 = τsinc 2 (fτ) (3.157) Alternatively, the triangle function (with width 2τ) can be obtained more easily as the convolution of two square functions (with width τ) scaledby 1/τ: triangle(t) = 1 τ rect(t) rect(t) (3.158) its Fourier transform can be conveniently obtained based on the convolution theorem: F[triangle(t)] = 1 τ F[rect(t) rect(t)] = 1 τ τsinc(f) τsinc(f) =τsinc2 (fτ) (3.159) Gaussian function: Consider the Gaussian function x(t) =e π(t/a)2 /a. Notethatinparticular when a = 2πσ 2, x(t) becomesthenormaldistributionwithvarianceσ 2 and mean µ =.Thespectrumofx(t) is: [ ] 1 X(f) =F a e π(t/a)2 = 1 e π(t/a)2 e j2πft dt a = 1 e π((t/a)2 +j2ft) dt = 1 a a eπ(jaf)2 e π[(t/a)2 +j2ft+(jaf) 2] dt = e π(af)2 e π(t/a+jaf)2 d(t/a + jaf)=e π(af)2 (3.16) The last equation is due to the identity e πx2 dx =1. We see that the Fourier transform of a Gaussian function is another Gaussian function, and the area underneath either x(t) orx(f) is unity. Moreover, If we let a, x(t) willapproachδ(t), while its spectrum e π(af)2 approaches 1. On the other hand, if we rewrite the above as X(f) =F[x(t)] = F[e π(t/a)2 ]=ae π(af)2 (3.161) and let a, x(t) approaches1andx(f) approachesδ(f). Impulse train:

150 132 Continuous-Time Fourier Transform Figure 3.6 Impulse train and its spectrum As discussed before the impulse train is a sequence of infinite unit impulses separated by a constant time interval T : comb(t) = δ(t nt ) (3.162) n= The Fourier transform of this function is: F[comb(t)] = = n= = f comb(t)e j2πft dt = n= δ(t nt )e j2πft dt = δ(f nf )= 1 T n= [ n= n= δ(t nt ) e j2πnft ] e j2πft dt δ(f n/t ) (3.163) where we have used Eq.1.35 with F replaced by f.thisequation,alsocalled Poisson formula, isveryusefulinthediscussionofimpulsetrains. Periodic signals: As discussed before, a periodic signal x T (t + T )=x T (t) canbeexpanded into a Fourier series with coefficients X[k], as shown in Eq.3.6. We can also consider this periodic signal as the convolution of a finite signal x(t) which is zero outside the interval <t<t and an impulse train with the same interval: x T (t) =x(t) δ(t nt ) (3.164) n= This is illustrated in Fig.3.7. According to the convolution theorem, the Fourier transform of this periodic signal can be found to be: F[x T (t)] = F[x(t) δ(t nt )] = F[x(t)] F[ δ(t nt )] (3.165) n= n= Here the two Fourier transforms on the right-hand side above are, respectively: F[x(t)] = T x(t)e j2πft dt (3.166)

3.2 The Fourier Transform of Non-Periodic Signals 133 Figure 3.7 Generation of a periodic signal Figure 3.8 Aperiodicsignalanditsspectrum and (Eq.3.163) F[ n= δ(t nt )] = 1 T k= δ(f kf ) (3.

151 3.2 The Fourier Transform of Non-Periodic Signals 133 Figure 3.7 Generation of a periodic signal Figure 3.8 Aperiodicsignalanditsspectrum and (Eq.3.163) F[ n= δ(t nt )] = 1 T k= δ(f kf ) (3.167) Substituting these into Eq we get: [ T ] [ ] F[x T (t)] = x(t)e j2πft 1 dt δ(f kf ) T k= 1 T = x(t)e j2πkft dt δ(f kf )= X[k] δ(f kf ) T k= k= (3.168) where f =1/T is the fundamental frequency. This result indicates that the periodic signal has a discrete spectrum, which can be represented as an impulse train weighted by the Fourier coefficients X[k]. As an example, a square wave and its periodic version are shown respectively on the left of Fig.3.8, and their corresponding spectra are shown on the right. We see that the spectrum of the periodic version is composed of a set of impulses, weighted by the spectrum X(f) = F[x(t)]. Fig.3.9 shows a set of typical signals on the left and their Fourier spectra on the right The Uncertainty Principle According to the property of time and frequency scaling (Eq.3.98), if a time function x(t) is expanded(a < 1), its spectrum X(f) will be compressed,and,

152 134 Continuous-Time Fourier Transform Figure 3.9 Examples of continuous-time Fourier transforms AsetofsignalsareshownontheleftandtheirFourierspectraareshownonthe right (real and imaginary parts are shown in solid and dashed lines, respectively). conversely, if x(t) iscompressed(a > 1), X(f) willbeexpanded.thisproperty indicates that if the energy of a signal ismostlyconcentratedwithinashort time range, then the energy in its spectrum is spread in a wide frequency range,

1 Signals and systems

1 Signals and systems 978--52-5688-4 - Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis Signals and systems In the first two chapters we will consider some basic concepts and ideas as