New Concepts in Frame Theory Motivated by Acoustical Applications

Size: px

Start display at page:

Download "New Concepts in Frame Theory Motivated by Acoustical Applications"

Beatrix Jordan
5 years ago
Views:

1 New Concepts in Frame Theory Motivated by Acoustical Applications Peter Balazs Habilitationsschrift Universität Wien, Fakultät für Mathematik Wien, March 3, 2011

2 Chapter 1 Preface Application-oriented mathematics develops theoretical results and new mathematical concepts, motivated by application, in contrast to applied mathematics focusing just on providing and applying mathematical tools for the applied sciences. The application-oriented approach produces results significant both for mathematics and the applied sciences. In this context we developed new concepts in frame theory motivated by signal processing and acoustical applications. Frames are generalizations of bases, and give more freedom for the analysis and modification of information. The concept of frames is a theoretical background for signal processing. On the other hand, signal processing algorithms and processes are essential for application in audio and acoustics. Linking the mathematical frame theory, the signal processing algorithms, their implementations and finally acoustical applications leads to a very promising, synergetic combination of research in different fields, which has not been fully exploited yet. To establish that link a thorough investigation of the theory is important. So we have investigated topics in frame theory, extending the standard mathematical concepts. As a particular case of analysis and synthesis systems we have researched mathematical topics in time-frequency analysis. Furthermore a big focus was the mathematical theory of multipliers, which are operators created by combining frame analysis, multiplication and resynthesis. To show that frame theory is important for applications we have included two applied topics, which both apply Gabor frame multipliers in acoustical projects. The focus of our work is also the focus of this habilitation thesis and can be summarized by the following grouping: Theory: Frame Theory [1-3] Time-Frequency Analysis [4-5] Frame Multipliers [6-8] a

3 Acoustical Applications: Time-Frequency Sparsity by Perceptual Irrelevance [9-10] Acoustic System Estimation [11-12] Please note we use numeric references, e.g. [1], for the papers included in this habilitation thesis, while we use citations using the name and the year of publications for all other references, e.g. [Balazs, 2007]. Scientific Achievement One of the first scientific goals assigned to me at the Acoustics Research Institute of the Austrian Academy of Sciences, after finishing my mathematical studies, was to find a precise formulation of the heuristic irrelevance algorithm developed in [Eckel, 1989]. This was one of the reasons, why I started to be interested in the topic of multipliers, and it took me some years to reach the set goal in [9]. So the application-oriented mathematics approach was right at the start of my career and it continues to be the foundation of my scientific work. With my PhD, I have set the first steps towards creating the connection of mathematics, signal processing and psychoacoustics. The contained novel ideas have been extended in several journal publications subsequently. In particular, the idea of double preconditioning [Balazs et al., 2006] resulted in a new approach for an efficient algorithm to calculate the perfect reconstruction window for a Gabor transform. While being useful also to my own work, e.g. for [1,9], it has also inspired several colleagues in their research, combining Gabor theory and numerical mathematics, proven by many citations, [Werther et al., 2005] (citing the preprint) and [Søndergaard, 2007, Janssen and Søndergaard, 2007, Hampejs and Kracher, 2007, Chai et al., 2008, Mi et al., 2009b, Mi et al., 2009a, Cheng et al., 2009, Chai et al., 2010, Moreno-Picot et al., 2010, Dörfler, 2010]. I was fascinated by the beautiful theory of frames and related sequences, by reading [Christensen, 2003, Casazza, 2000, Gröchenig, 2001]. This concept was not only an enchanting abstract theory, it also had a connection to applications in acoustics. The fascination with this mathematical theory led to an investigation of semi-frames [2] and the relation of the properties of sequences and the associated (frame-related) operators [Balazs and El-Gebeily, 2008] [3], which were studied purely out of mathematical interest. I am quite happy that also small results like the investigation of the connection of frames and finite dimensionality [Balazs, 2008a] have received some recognition by the scientific community [Cotfas and Gazeau, 2010, Rahimi, 2009, Špiřík et al., 2010]. Connected to the theory of frame multipliers the concept of weighted frames b

4 was investigated [1]. This work again was used as basis for many of my own papers, e.g. [2,6], but also found recognition in [Aceska, 2009, Antoine and Vandergheynst, 2007] (as preprint). I have developed the novel concept of frame multipliers in [Balazs, 2007] by generalizing Gabor multipliers to the general frame case. Also this work was the basis for many of my later papers, e.g. [6-8], but was also cited by other authors [Rudol, 2011, Arias and Pacheco, 2008, Ambroziski and Rudol, 2009, Rahimi, 2009, Dörfler, 2010]. Within this topic of frame multipliers, I have shown further mathematical results, for example, finding the best approximation in the Hilbert-Schmidt setting [Balazs, 2008b]. Also here I can show several non-self citation [Arias and Pacheco, 2008, Chen et al., 2009a, Dörfler and Torrésani, 2010, Aceska, 2009, Li, 2009, Xiao et al., 2009, Chen et al., 2009b, Ahmad and Iqbal, 2009, Rahimi, 2009]. This generalization of the concept of Gabor multipliers was motivated by applications, as for many acoustical challenges other analysis systems like wavelets or auditory filterbanks are often advantageous. Also in these cases a simple modification by analysis, multiplication and resynthesis would be a powerful tool. With the new concept of frame multipliers basic properties are shown. Furthermore it was established that these results do not depend on an underlying group structure, but can be shown for general frames. The natural extension of the standard approach to a frame representation of operators in [Balazs, 2008c] is an abstract mathematical topic and was started as purely mathematical fundamental research. But later the connection to the Galerkin approach in the boundary element method (BEM), see e.g. [Gaul et al., 2003], became apparent. BEM is used for finding numerical solutions to operator equations, and its connection to frame theory will be further developed in future research (sketched in [Rieckh et al., 2010a]). [Balazs, 2008c] was used in some of my own work, but also was cited by [M. L. Arias and Pacheco, 2007, Ambroziski and Rudol, 2009, Rahimi, 2009, Dörfler and Torrésani, 2010, Rudol, 2011]. I have shown the importance of mathematical approaches for applications in the estimation of the perceptual irrelevance in the time-frequency plane based on a simple simultaneous masking model [9]. This is the basis for future work using current psychoacoustical experiments (as in [10]) and a perceptual based filterbank (based on an implementation of a nonstationary Gabor frame [5]). Also mentioned in this habilitation thesis is the Multiple Exponential Sweep Method (MESM) [11], which is a system identification for weakly non-linear, weakly time-variant systems. This method relied on a time-frequency motivated approach. It was applied in the research of the Acoustics Research Institute several times [Majdak et al., 2011, Majdak et al., 2010], but also was cited c

5 by [Rébillat et al., 2011, Farina, 2009, Enzner, 2009, Søndergaard, 2007, Rébillat et al., 2010, Weinzierl et al., 2009, Pulkki et al., 2010]. Due to my pluridisciplinary orientation, I was also able to introduce a mathematical view-point to other applied topics and create novel methods for them, like in the simulation of vibrations, see [Balazs et al., 2007] (cited by [Hähnel, 2010]) and [Kreuzer et al., 2011], and the estimation of a vocal tract model [Marelli and Balazs, 2010]. Currently the usefulness of the later method for forensic speech comparison is investigated [Enzinger et al., 2011]. The actual relevance of the mathematical topics mentioned above is confirmed by a variety of projects realized in the last few years. In particular, I was able to attract funding for the project Frame Multipliers: Theory and Applications in Acoustics within the call Mathematics and... as a High Potential, which gave me the possibility to create a working group Mathematics and Signal Processing in Acoustics at the Acoustics Research Institute. I have been establishing active cooperations with internationally renowned scientists. This eagerness to cooperate on an international level can also be seen by numerous talks and 18 proceedings publications for conference and workshops, as well as by being a partner in funded projects organized by other scientists. I have also been the organizer of a number of workshops. At the start of my career, I was employed in Marseille and Louvain-la-Neuve. The subset of papers chosen for this habilitation thesis, out of the 15 published (or accepted) and 8 submitted journal papers were selected because of their topical connection, as well as some habilitation regulations. Please note that in the PhD thesis [Balazs, 2005] a lot of material was included, that directly lead to five successive journal publications [Balazs et al., 2006, Balazs, 2007, Balazs, 2008c, Balazs, 2008b, Balazs, 2008a], which, due to habilitation thesis rulings, are not included here. Thus, although I can refer to 15 submitted or accepted journal papers as well as 7 peer-reviewed proceedings publications, rather many recently accepted or submitted papers can be found in the list of the papers included in this thesis. For the updated status of my publications, please refer to Frames Theory for Acoustical Applications While we have addressed the mathematical importance of our work in the last section, here we would like to explain in more details, why the particular chosen connection between mathematics and acoustics in my personal work had (and still has) a powerful synergetic effect. d

6 We live in the age of information where the analysis, classification, and transmission of information is of essential importance. Signal processing tools and algorithms form the backbone of important technologies like MP3, digital television, mobile phones and wireless networking. Many signal processing algorithms have been adapted for applications in audio engineering and acoustics, also taking into account the properties of the human auditory system. The mathematical concept of frames is an important theoretical background for sampling theory and signal processing. Frames are generalizations of bases that give more freedom for the analysis and modification of information - however, this concept is still not firmly rooted in applied research. Our past experience in the work on scientific projects has shown that linking mathematical frame theory, signal processing algorithms, their implementations and finally acoustical applications leads to a very promising, synergetic combination of research in different fields. During the years I have been working in application-oriented mathematics for acoustics, I have made the following three observations regarding the link of theoretical and applied research: (1.) Frame theory is very useful by not fully understood in applications: Frames very often occur in signal processing and acoustical applications. They have been implicitly used for many years without fully exploiting the related theory. To use analysis / synthesis systems other than orthonormal bases is sometimes seen as problematic in applied sciences. The mathematical theory provides enough knowledge to establish the fact that frames are an applicable, stable and favorable tools for applications. The link from frame theory to signal processing and from signal processing to acoustical application is partially recognized, but needs further strengthening. The full link between all three fields leads to very promising pluri-disciplinary research and is a novel approach. (2.) Understanding the mathematical theory improves modeling in applications: The results of frontier research in mathematical theory is often not directly and immediately adaptable to given applications. But, given a thematic framework, the abstraction level and deep understanding of the theory needed for those results are of essential importance in a modeling and implementation stage for applications. Many applied sciences in acoustics measure empirical data and formulate heuristic models, usually with a modest mathematical basis. Mathematically precise statements considerably enhance the precision and stability of algorithms and models and can already be implemented at an early stage. (3.) Applications lead to interesting mathematical questions: On the other e

7 hand the acoustical applications often raise mathematical questions, which by themselves can be very interesting on an abstract mathematical level. Those questions might not have arisen in a purely theoretical setting. The work that led to this habilitation thesis started with the observation that many of the methods developed and applied in acoustics, employ timefrequency analysis / synthesis systems, often with possible modification in between. A typical example is the phase vocoder, see e.g. [Dolson, 1986]. The importance of prefect reconstruction in analysis / synthesis systems and the scientific interest in the abstract theory behind it lead to investigations in frame theory, also connected to frame multiplier. For audio applications the natural setting for analysis is the the time-frequency plane, so we also studied Gabor frames and Gabor multipliers. Being fascinated by the abstract frame theory lead to the development of the concept of frame multipliers [Balazs, 2007], extended and used in [Balazs, 2008b] and [1,6]. While some of these investigations lead to results in an abstract setting, the basic motivation still came from acoustical applications. Even if this abstract analysis did not lead to results, which could be directly applied in the applications, the abstract and theoretical treatment of this topic helped handling these concepts in an applied setting, in the sense of observation (2) above. The inversion of a system is an important topic in many applications, like in vibration modeling. This, again in the beautiful abstract frame theory setting, lead to the investigation of the invertibility of multipliers [7-8], which are aimed to be the basis for future implementations usable in acoustical applications. In applications implementations are needed. Within the Acoustics Research Institute all developed algorithms are integrated into the ST X software system [Balazs and Noll, 2003, Noll et al., 2007]. While the investigation of the double preconditioning algorithm [Balazs et al., 2006] was first motivated by the goal of speeding up algorithms in ST X, it lead to numerical and mathematical fundamental research, where only the most basic approach is needed and implemented in ST X. Integrating algorithms in a supported software systems keeps the code available and accessible. It also shows the relevance of the developed methods. Therefore current (and future) methods, e.g. based on research in [10] and [12], will be included there. Furthermore all my developed algorithms are and will be incorporated in the Linear Analysis Time-Frequency Toolbox (LTFAT) [Søndergaard, 2007, Soendergaard et al., 2010]. This is an open source software, which therefore is used both by applied and mathematical researchers. Because of the above mentioned importance of analysis / synthesis systems with possible modification as well as time-frequency representation, f

8 Gabor frame multipliers are a very useful method to realize time-variant filters. They are applied in the topics of perceptual irrelevance models [9-10] and system identification by exponential sweeps [11-12]. In this settings theory and applications are converging more closely together beyond mere conceptual connection mentioned in observation (2). As mentioned above, not all theoretical results can be directly useful for applications, apart from a better grasp for the basic idea and concept. In the mentioned topics, theory and applications are converging more directly. In [9] and in [12] rather recently investigated theoretical properties were utilized. This resulted in methods, which could not be created without the mathematical background. Acknowledgments I thank all my co-authors and cooperation partners, who provided me with a lot of productive ideas, comments and research projects. Because there are too many to mention (and also for that I am very, very thankful) let me just state it like this: Thank you, friends! This goes, in particular, to all the co-authors of papers included in this habilitation thesis. I thank the Acoustics Research Institute, in particular Werner A. Deutsch, for providing me with perfect conditions for working in a productive and open pluri-disciplinary environment. I warmly thank Hans G. Feichtinger for introducing me to this wonderful part of science, connecting mathematics to applications, and his continuous support since the start of my PhD. I thank Hans G. Feichtinger, K. Gröchenig and G. Rieckh for providing useful comments and suggestions on this document, as well as T. Krutzler and D. Stoeva for proof-reading. Part of the work leading to this thesis was supported by the European Union s Human Potential Programme, under contract HPRN-CT (HASSIP), the WWTF project MULAC (Frame Multipliers: Theory and Application in Acoustics; MA07-025) and the WTZ Amadée project 1/2006. I acknowledge gratefully the hospitality of the Groupe de Traitement du Signal, Laboratoire d Analyse Topologie et Probabilités, CMI, Université de Provence, the group Modélisation, Synthése et Contrôle des Signaux Sonores et Musicaux, Laboratoire de Mécanique et dacoustique, CRNS Marseille and the Institut de Recherche en Mathématique et Physique, Université catholique de Louvain. For all my scientific life I was supported by the Acoustics Research Institute of the Austrian Academy of Sciences. I especially thank my family, Claudia, Barbara and Michael for making my life rich, also outside mathematics and acoustics. g

9 List of Included Papers Mathematical Theory Frame Theory [1] Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties (with J.-P. Antoine and A. Grybos), International Journal of Wavelets, Multiresolution and Information Processing, Volume 8 (1), pp (2010) [2] Frames and Semi-Frames (with J.-P. Antoine), arxiv: v1, submitted to Journal of Physics A: Mathematical and Theoretical (2011) [3] Classification of General Sequences by Frame-Related Operators (with D. Stoeva and J. P. Antoine), Sampling Theory in Signal and Image Processing (STSIP), to appear (2011) Time-Frequency Analysis [4] The Phase Derivative Around Zeros of the Short-Time Fourier Transform (with D. Bayer, F. Jaillet and P. Søndergaard), submitted to Advances in Pure and Applied Mathematics (2011) [5] Non-stationary Gabor Frames (with F. Jaillet and M. Dörfler), SAMPTA 09, International Conference on SAMPling Theory and Applications proceedings, pp (2009) [ (peerreviewed proceedings paper; an extended journal paper is in preparation) Theory of Frame Multipliers [6] Multipliers for p-bessel sequences in Banach spaces (with A. Rahimi), Integral Equations and Operator Theory, Volume 68 (2), (2010) h

10 [7] Unconditional convergence and invertibility of multipliers (with D. Stoeva), arxiv: v3, in revision for Applied and Computational Harmonic Analysis (2010) [8] Detailed characterization of conditions for the unconditional convergence and invertibility of multipliers (with D. Stoeva), arxiv: v1, submitted to Complex Analysis and Operator Theory (2011) Applications in Acoustics Time-Frequency Sparsity by Perceptual Irrelevance [9] Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking (with B. Laback, G. Eckel and W. Deutsch), IEEE Transactions on Audio, Speech and Language Processing, Vol. 18 (1), pp , (2010) [10] Additivity of nonsimultaneous masking for short Gaussian-shaped sinusoids (with B. Laback, T. Necciari, S. Savel, S. Ystad, S. Meunier and R. Kronland-Martinet), The Journal of the Acoustical Society of America, to appear (2011) Acoustic System Estimation [11] Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions (with P. Majdak and B.Laback), Journal of the Audio Engineering Society, Vol. 55, No. 7/8, July/August 2007, Pages (2007) [12] A Time-Frequency Method for Increasing the Signal-To-Noise Ratio in System Identification with Exponential Sweeps (with P. Majdak, W. Kreuzer and M. Dörfler), 36th International Conference on Acoustics, Speech and Signal Processing ICASSP 2011, Prag, to appear, 2011 (peer-reviewed proceedings paper; an extended journal paper is in preparation) i

11 Contents 1 Preface a 2 Introduction and Summary Frame Theory State of the Art Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties [1] Frames and Semi Frames [2] Classification of General Sequences by Frame-Related Operators [3] Time-Frequency Analysis State of the Art The Phase Derivative Around Zeros of the Short-Time Fourier Transform [4] Non-Stationary Gabor Frames [5] Theory of Frame Multipliers State of the Art Multipliers for p-bessel sequences in Banach spaces [6] Unconditional Convergence and Invertibility of Multipliers [7] Detailed characterization of conditions for the unconditional convergence and invertibility of multipliers [8] Applications in Acoustics: Time-Frequency Sparsity by Perceptual Irrelevance State of the art Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking [9] Additivity of nonsimultaneous masking for short Gaussian-shaped sinusoids [10] Applications in Acoustics: Acoustic System Estimation State of the art j

12 2.5.2 Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions [11] A Time-Frequency Method for Increasing the Signal- To-Noise Ratio in System Identification with Exponential Sweeps [12] k

13 Chapter 2 Introduction and Summary 2.1 Frame Theory State of the Art A sequence Ψ = (ψ k ) k K in the Hilbert space H is a frame for H, if there exist positive constants A Ψ and B Ψ (called lower and upper frame bound, respectively) that satisfy A Ψ f 2 f, ψ k 2 B Ψ f 2 f H. (2.1) k K If at least the upper (or the lower) inequality is fulfilled this sequence is called a Bessel sequence (or a lower frame sequence, respectively). A frame that is not a basis is called over-complete. A frame where the two bounds can be chosen to be equal, i.e. A Ψ = B Ψ, is called tight. By C Ψ : H l 1 2 we denote the analysis operator defined by (C Ψ f) k = f, ψ k. The adjoint of C Ψ is the synthesis operator D Ψ (c k ) = k c kψ k. The frame operator S Ψ = D Ψ C can be written as S Ψ f = k f, ψ k ψ k. The Gram matrix (G (Ψ) k,l ) k,l is defined by G (Ψ) k,l = ψ l, ψ k, k, l N. This matrix defines an operator on l 2 by matrix multiplication, corresponding to G = CD. If no confusion will arise, we will omit the indexes, writing, for example, S for S Ψ. Frame theory gives a stable way to reconstruct the signal perfectly from these coefficients by using the canonical dual frame ( ψ k ). It is found by applying the inverse of the frame operator S to the original frame elements, i.e. ψk = S 1 ψ k for all k. Then for all f H we have the reconstruction f = f, ψ k ψ k = f, ψ k ψ k. In this way, perfect reconstruction, which k k is very often a goal in signal processing analysis/synthesis systems, can be reached easily. The so called canonical tight frame is defined by ψ (t) = S 1/2 ψ k. zero. 1 We denote by l p the p-summable sequences, and by c 0 the sequences converging to 1

14 We will denote an orthonormal basis (ONB) of the Hilbert space by E = (e k ), i.e. a complete sequence for which the Gram matrix is the identity. Frames are generalizations of bases. Contrary to bases, frames lead to redundant representations. In general, the range of C is a proper subset of l 2. It is equal to all of l 2 if and only if the frame is a basis. Choosing an arbitrary sequence in l 2 and applying the Gram matrix G = CD corresponds to a mapping from l 2 into the ran(c), obviously. Even more, this mapping is a projection. This can be called the reproducing kernel property of the over-complete frame. Finding and constructing frames, satisfying certain a-priori properties, is often an easier task, than doing that for bases. This can readily be experienced in time-frequency analysis. The often used Gabor transform, see Section 2.2, can be much better localized in the time-frequency domain if the associated sequence has a frame rather than a basis-property. This is the well-known Balian-Low theorem, see e.g. [Gröchenig, 2001], which states that it is impossible to have a Gabor sequence, which has both good timefrequency localization and a basis property. This shows two things: First, even if it is impossible to find a basis with certain properties, it can still be possible to find a frame. Secondly, analysis with redundant frames instead of bases can have the big advantage, that it is easier to directly interpret the coefficients (e.g. for Gabor sequences by the time-frequency localization). This is advantageous for many applications in acoustics and therefore frames have been implicitly used for many years, without the benefit of having the mathematical background. Frame theory is one of the most important foundations of Gabor theory [Feichtinger and Strohmer, 1998, Gröchenig, 2001] and wavelet theory [Ali et al., 2000, Daubechies, 1992, Flandrin, 1999], see also Section 2.2. It is a highly active mathematical discipline, whose results have also been proved to be relevant for signal processing, see e.g. [Bölcskei et al., 1998]. Frames also emerged in the context of the theory of (generalized) coherent states in quantum physics [Gazeau, 2009]. In this setting often continuous frames are considered, i.e., loosely speaking, the index of the frame is continuous and in Equation 2.1 integrals instead of sums are considered, see Definition 2.4. We have investigated several topics in frame theory [Balazs and El-Gebeily, 2008, Balazs, 2008a], some of them included also in this habilitation thesis, see Sections , [1-3]. Several other topics are under investigation, for example related to Frames of translates, where a publication is currently in submission 2. The frame representation introduced above is applied on functions, but frames can be also used to represent operators. In computational 2 P. Balazs, C. Cabrelli, S. Heineken and U. Molter, Frames of Irregular Translates 2

15 acoustics, for example, one aims to solve operator equations numerically, for example equations for vibration modeling [Balazs et al., 2007]. Here the finite element [Hackbusch, 2003] and the boundary element method [Sauter and Schwab, 2004] are widely used. One particular scheme to discretize the operator equations is the Galerkin method [Gaul et al., 2003]. This corresponds to taking finite sections of the standard matrix description [Gohberg et al., 2003] of operators O using an ONB (or biorthogonal basis) (e k ) by constructing a matrix M with the entries M j,k = Oe k, e j. But, as was indicated before, the search for bases with certain properties, like sparsity of the system matrix, can be a very restrictive approach. The relaxation and generalization to frames can lead to more stable and faster algorithms. Using frames instead of bases led directly to the matrix representation of operators using frames in [Balazs, 2008c]. In future work this approach will be linked to adaptive frame methods [Dahlke et al., 2005] and the Galerkin method. A particular way to define operators is to apply frame analysis, multiplication and frame synthesis, which results in the concept of frame multipliers, which will be the main topic in Section 2.3. Frame theory is also important for numerical purposes. At first it might seem unfeasible to use redundant systems for numerical purposes. But frame theory was already successfully applied in the field of compressed sensing/sparsity [Gribonval and Nielsen, 2003, Dahlke and Teschke, 2008]. The basic idea, why frames are advantageous for sparsity can be seen in the following motivation: In a rich dictionary, with a lot of entries, it is much easier to find the correct pieces to have a short, i.e. sparse, representation of a given sentence, i.e. signal. Furthermore, as we have already noted, it is often much easier to construct frames. The concept of sparsity was also shown to be significant for applications in audio and acoustics, see e.g. [Daudet, 2010, Plumbley et al., 2010]. Sparsity is also a topic when solving matrix equations efficiently. The hierarchical matrices (or H-matrices) [Hackbusch, 1999] use a data-sparse approach. While sparsity is included in this thesis only in the sense of perceptual sparsity, see Section 2.4, in future work we aim to connect the two sparsity concepts. In the future we also plan to use the data-sparse approach together with adaptive frame approaches [Dahlke et al., 2005], already sketched in [Rieckh et al., 2010a, Rieckh et al., 2010b]. We show recent expansions of frame theory in the next sections. For the summary of these ideas we will need some further definition of sequences. The sequence Ψ is called a frame-sequence if it is a frame for its closed linear span; a Riesz sequence for H with bounds A, B if A > 0, B < and A c k 2 c k ψ k 2 B c k 2 for all finite scalar sequences (c k ) 3

16 (and hence, for all (c k ) l 2 ); a Riesz-Fischer sequence with bound A if A > 0 and A c k 2 c k ψ k 2 for all finite scalar sequences (c k ) (and hence, for all (c k ) l 2 such that k=1 c kψ k converges in H); a Riesz basis if it is a complete Riesz sequence. norm-bounded below (resp. norm-bounded above) if inf n φ n > 0 (resp. sup n φ n < ). norm-semi-normalized if 0 < inf n φ n sup n φ n <. We will call a sequence of numbers (m n ) semi-normalized if 0 < inf n m n sup n m n < Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties [1] A sequence Ψ = (ψ k ) and a complex weight (ω n ) are called a weighted frame, if there exist constants A > 0 and B < such that A f 2 n Γ w n 2 f, ψ n 2 B f 2. (2.2) These are sequences (ψ n ) with complex weights (ω n ) such that the sequence (ω k ψ k ) is a frame. They were introduced in the PhD thesis [Jacques, 2004] and then taken over in [Bogdanova et al., 2005], in order to get a numerically more efficient approximation algorithm for spherical wavelets. A similar but not equivalent concept are signed frames in [Peng and Waldron, 2002]. Weighted frames also occur naturally in the theory of fusion frames [Casazza and Kutyniok, 2004] as well as for Gabor [Gabardo, 2009] or wavelet frames [Heil and Kutyniok, 2003]. This concept lacked the investigation in the general frame theory context. By decreasing the ratio of the frame bounds, weighting can improve the numerical efficiency of iterative algorithms like the frame algorithm [Christensen, 2003] for the inversion of the frame operator. The works [Jacques, 2004, Bogdanova et al., 2005] introduced and used controlled frames, that is, a frame (ψ n ) and an operator T such that the combination of T with the frame operator is positive and invertible, i.e. there exist positive constants A T L and B T L, such that A T L f 2 n ψ n, f f, T ψ n B T L f 2, for all f H. (2.3) Since these concepts were used there just as a tool for spherical wavelets, they were not discussed in full detail. 4

17 In [1], we developed the related theory and derived some results, among them properties used in [Jacques, 2004] and [Bogdanova et al., 2005] without proof, as well as give the results of numerical experiments. We showed that controlled frames are equivalent to standard frames and so this concept gives a generalized way to check the frame condition. The operator T acts as a preconditioning operator and so can improve the numerical properties of the inversion of the frame operator. For general frames, it seems difficult to find an appropriate preconditioning matrix, but for wavelet frames this technique is used in [Jacques, 2004, Bogdanova et al., 2005]. For Gabor frames, a way to find advantageous preconditioning matrices is presented in [Balazs et al., 2006]. In [1] we have put some emphasis on the mutual relationship between weighted and controlled frames, showing in particular that weighted frames cannot always be considered as controlled frames. We also have investigated how these concepts can improve the efficiency of iterative algorithms for inverting the frame operator. As a special case, we have considered semi-normalized weights, for which the concepts of frames and weighted frames are interchangeable again. The connection to frame multipliers [Balazs, 2007], see also Section 2.3, was addressed. In particular we showed the following result: Theorem Let (ψ n ) be a sequence of elements in H. Let w = (w n ) be a sequence of positive, semi-normalized weights. Then the following properties are equivalent: 1. (ψ n ) is a frame. 2. M w,ψ,ψ is a positive and invertible operator. 3. The pair (w n ),(ψ n ) forms a weighted frame. 4. ( w n ψ n ) is a frame. 5. M w,ψ,ψ is a positive and invertible operator for any positive, seminormalized sequence (w n). We investigated the concept of weighted frames in numerical experiments. We analyzed three different a-priori choices for weights with the aim of making frames tighter, i.e., reducing the quotient of the frame bounds. These choices were 1. ω (2) n = 2. ω ( ) n = ψ n. ψ n,ψ k 2 k ψ n sup k ψ n,ψ k. 5

18 3. ω (mult) n = M k=1 [ ( ) ] G (2) Ψ ψ k 2, nk where the last one corresponds to the best approximation of the identity by frame multipliers [Balazs, 2008b], where G (2) Ψ is the matrix ( G (2) ) Ψ pq = ψ q, ψ p 2 and denotes the pseudo-inverse. In preliminary tests we found that other p-weights are outperformed by ω (2) or ω ( ). In [1] we gave the results of some numerical experiments, showing that these weights very often improve the condition number of the frame operator matrix. In particular the weight ω (2) nearly always improves the frame bounds, while the weight ω (mult) often, but not always is the best choice of the given weights. We saw that redundancy is an important parameter for the optimality of these weights. We also examined the computational behavior of weighted Gabor frames. In particular we investigated how well the canonical dual weighted frame is approximated by the inversely weighted dual frame. We saw that the error depends linearly on the amount of weighted elements and the redundancy. As shown above, the concept of weighted frames is naturally connected to the topic of multipliers [Balazs, 2007], [6-8]. It was already cited in [Aceska, 2009, Balazs, 2008b, Antoine and Vandergheynst, 2007] (as preprint) and in [2,6] Frames and Semi Frames [2] There are situations where the notion of frame is too restrictive, in the sense that one cannot satisfy both frame bounds simultaneously. The very famous sequence Gabor dealt with in his original paper [Gabor, 1946], a Gabor system with a Gaussian window and redundancy 1, is a complete Bessel sequence, but does not fulfill the lower frame condition. By symmetry, there is room for two natural generalizations. We will say that a sequence Ψ is an upper (resp. lower) semi-frame, if (i) it is total in H; (ii) it satisfies the upper (resp. lower) frame inequality. Note that the lower frame inequality automatically implies that the sequence is total, i.e. (ii) (i) for a lower semi-frame. Also, in the upper case, S is bounded and S 1 is unbounded, whereas, in the lower case, S is unbounded and S 1 is bounded. We may also remark that a discrete upper semi-frame is nothing but a complete Bessel sequence. These are the concepts we investigated in [2], also for continuous frames. 6

19 The definition of frames above, Equation 2.1, concerns sequences, as required in numerical analysis. However, more general objects, called continuous frames, emerged in the context of the theory of (generalized) coherent states in theoretical and mathematical physics and were thoroughly studied [Ali et al., 2000, Rahimi et al., 2006, Fornasier and Rauhut, 2005]. Let X be a locally compact space with measure ν. We assume that X is σ-compact, that is, X = n K n, K n K n+1, K j relatively compact. Let Ψ := {ψ x, x X} be a family of vectors from a Hilbert space H indexed by points of X. Then we say that Ψ is a set of coherent states or a generalized frame if the map x f, ψ x is measurable for all f H and f, ψ x ψ x, f dν(x) = f, Sf, f, f H, X where S is a bounded, positive, self-adjoint, invertible operator on H, called the frame operator. In Dirac s notation, the frame operator S reads S = ψ x ψ x dx. X The operator S is invertible, but its inverse S 1, while still self-adjoint and positive, needs not be bounded. Thus, we say that Ψ is a frame if S 1 is bounded or, equivalently, if the (optimal) frame bounds satisfy A > 0 and B <, so that A f 2 f, Sf = ψ x, f 2 dν(x) B f 2, f H. (2.4) X For frames the spectrum Sp(S) of S is contained in the interval [m, M], these two numbers being the infimum and the supremum of Sp(S), respectively. These definitions are completely general. In particular, if X is a discrete set with ν the counting measure, we recover the standard definition 2.1 of a (discrete) frame. If one has 0 < ψ x, f 2 dν(x) M f 2, f H, f 0, (2.5) X then Ψ is called a (continuous) upper semi-frame. In this case, S 1 is unbounded, with dense domain dom(s 1 ). By symmetry (in fact, duality), we will speak of a lower semi-frame if the upper frame bound is missing. Note that, since S may now be unbounded, a lower semi-frame is no longer a coherent state, as defined above. In [2] we studied mostly upper semi-frames and gave some remarks for the dual situation. In particular, we show that reconstruction is still possible, in a certain sense. We covered the general (continuous) case, then 7

20 particularize the results to the discrete case. An important difference between these cases is how convergence is understood, weak convergence for the continuous case, strong convergence for the discrete case. For the discrete case clearly reconstruction on a dense subset works for upper semi-frames, that fulfill an additional condition: Proposition Let Ψ be a regular upper semi-frame for H, i.e Ψ dom(s 1 ). Then f = SS 1 f = k S 1 ψ k, f ψ k, f R S. (2.6) If we use the Gram matrix we can give a different reconstruction formula: Proposition For all f R D, we have the reconstruction formula f = k [ G 1 ( f, ψ k H ) ] ψ k (2.7) with unconditional convergence. For upper semi-frames, we can show that the following diagram is commutative: R C G 1/2 R C G 1/2 C(R D ) G1/2 C(R S ) D C D C D C S H 1/2 S R 1/2 S D R 1/2 S S(R D ) This connection can be described in the context of Gelfand triples. It leads to an extension of the reconstruction formula. The reconstruction formulas given above are only valid for every f R D or require regular upper semi-frames. With the connections in the diagram we can give a reconstruction formula valid for all f H, even in the case when Ψ dom(s 1 ), if we allow the analysis coefficients to be altered. Theorem Let (ψ k ) be an upper semi-frame. Then, for all f H, we have the reconstruction formula f = S [ ] 1/2 G 1/2 ψ k, f ψ k. k 8

21 2.1.4 Classification of General Sequences by Frame-Related Operators [3] The frame condition cannot always be satisfied, and so other classes of sequences have been investigated, for example, frame sequences, Bessel sequences, lower frame sequences, and Riesz-Fischer sequences [Balazs and El-Gebeily, 2008, Casazza et al., 2002, Christensen, 1995, Christensen, 2003]. For such sequences, which need not be frames in general, the frame-related operators, i.e. the analysis, the synthesis and the frame operator, can still be defined, see e.g. [Casazza et al., 2002, Christensen, 1995]. In these cases, these operators can be unbounded. In [3] we gave an overview of the connection between the properties of those operators and those of the sequences. This paper is both a survey as well as an original research paper. While some results about the connection of the properties of the frame-related operators and the sequences existed, they were spread out on many papers. Also for complete results a lot of holes remained. We collected existing results, extended them and added new, original results, leading to results like the following: Proposition Given a sequence Ψ, the following statements hold. (a1) Ψ is a Bessel sequence if and only if the domain of D is all of l 2, i.e. dom(d) = l 2. (a2) Ψ is a Bessel sequence with bound B if and only if dom(d) = l 2 and D is bounded with D B. (b1) Ψ is a frame sequence if and only if dom(d) = l 2 and ran(d) is closed. (b2) Ψ is a frame sequence if and only if ran(d) is closed and ran(d) dom(c). (b3) Ψ is a frame sequence if and only if dom(d) = l 2 and ran(d) = ran(s). (c) Ψ is a frame if and only if dom(d) = l 2 and D is surjective. (d) Ψ is a Riesz basis for H if and only if dom(d) = l 2 and D is bijective. (e) Ψ is a lower frame sequence for H if and only if ran(d) is dense in H and ran(d ) is closed. (f) Ψ is a Riesz-Fischer sequence if and only if D is injective and D 1 is bounded on ran(d). (g) Ψ is complete in H if and only if ran(d) is dense in H. 9

22 Some of these connections are well known or rather apparent, while others had to be proved. Similar results for C, S and G were also proved in [3]. Another way of classifying sequences is to consider them as images of orthonormal bases under specific classes of operators. For this approach we showed: Proposition Let (e k ) k=1 be an orthonormal basis for H. (a) The Bessel sequences for H are precisely the families (V e k ) k=1, where V : H H is a bounded operator. (b) The frame sequences for H are precisely the families (V e k ) k=1, where V : H H is a bounded operator with closed range. (c) The frames for H are precisely the families (V e k ) k=1, where V : H H is a bounded and surjective operator. (d) The Riesz bases for H are precisely the sequences (V e k ) k=1, where V : H H is a bounded bijective operator. (e) The lower frame sequences for H are precisely the families (V e k ) k=1, where V : dom(v ) H is a densely defined operator such that e k dom(v ), k N, V is injective with bounded inverse on ran(v ), and V ( n k=1 c ke k ) V ( k=1 c ke k ) as n for every k=1 c ke k dom(v ). (f) The Riesz-Fischer sequences are precisely the families (V e k ) k=1, where V is an operator having all e k in the domain and which has a bounded inverse V 1 : ran(v ) H. (g) The complete sequences are precisely the families (V e k ) k=1, where V : dom(v ) H is a densely defined operator such that e k dom(v ), k N, ran(v ) is dense in H (equivalently, the adjoint V is injective) and V ( n k=1 c ke k ) V ( k=1 c ke k ) as n for every k=1 c ke k dom(v ). 2.2 Time-Frequency Analysis State of the Art The Fourier Transformation is a well known mathematical tool to analyze the frequency content of a signal. It is defined in L 1 (R) by F (f) (ω) = ˆf(ω) = f(t)e 2πiωt dt. R 10

23 It can be extended by density to L 2 (R). Due to the very efficient algorithms of the fast Fourier transformation (FFT), see e.g. [Walker, 1991], it has been used in many signal processing methods. If humans listen to a sound, a voice or music, they do not only hear frequencies and their amplitudes but also their dynamic development. So it is very natural to search for a joint time frequency analysis. A well known method for a time frequency representation is the short time Fourier transformation (STFT). It is defined for f, g L 2 (R), see e.g. [Gröchenig, 2001], by V g (f)(τ, ω) = f(t)g(t τ)e 2πiωt dt. The STFT V g (f)(x, ω) provides information about the frequency content of the signal f at time τ and frequency ω. One possibility to look at this method is the following: the signal f is multiplied with the shifted window function g(t τ). This results in a windowed version of the signal, that is concentrated at the time τ (if the window is chosen accordingly, localized around zero). Then the Fourier transformation is applied to the result. Thus, the analyzing window g determines the resolution in time and frequency, which is the same in the whole time-frequency domain. This can also be seen as a projection of the signal f(x) on the timefrequency shifted Gabor atoms M ω T τ g(t), where T denotes the translation operator (T τ f) (t) = f(t τ) and M the modulation operator (M ω f) (t) = e 2πiωt f(t): V g (f)(τ, ω) = f, M ω T τ g(t). The STFT is invertible: Corollary Let g,γ L 2 (R) and g, γ L 2 (R) 0. Then 1 f(t) = V g f(s, ω)γ(t s)e 2πiωt dsdω. g, γ L 2 (R) R This is a direct consequence of the orthogonality relations for the STFT: Theorem Let f 1, f 2, g 1, g 2 L 2 (R), then V gj f j L 2 ( R 2) for j = 1, 2 and V g1 f 1, V g2 f 2 L 2 (R 2 ) = f 1, f 2 2 L (R) g 1, g 2 2 L (R) If the STFT is not considered for continuous variables ω and τ, but in a sampled version, V g f (ka, lb) for k, l Z and a, b fixed constants, it is called a Gabor transform. A Gabor system with time shift parameter a and frequency shift parameter b is given by: G(g, a, b) = {M b l T a k g : k, l Z} = {e 2πiblx g(x k a) : k, l Z}. 11

24 In this sampled version the inversion is not automatic. Inversion is possible if the Gabor system forms a frame. The dual frame for a Gabor frame is just the Gabor system of the dual window g = S 1 g. This is a very special property of Gabor systems. For example, for wavelet frames, the canonical dual does not need to be wavelet frames again. Apart from the topics mentioned in this habilitation thesis, [4-5], we have dealt with Gabor systems a lot in the past. The PhD thesis [Balazs, 2005] was focused on irregular Gabor frames, as well as an efficient way to invert the Gabor frame operator by double preconditioning [Balazs et al., 2006]. We have investigated a particular property of the phase derivative of the STFT in [4], which was first discovered in numerical experiments and then proved mathematically. We also extended the standard Gabor approach to a more general one [5] allowing an adaptive time-frequency resolution either in time or in frequency The Phase Derivative Around Zeros of the Short-Time Fourier Transform [4] The interpretation of the modulus of the STFT is relatively easy, considering the fact that the spectrogram (defined as the square absolute value of the STFT) can be interpreted as a time-frequency distribution of the signal energy. This interpretation led to the important success of the STFT in signal processing. But the interpretation of the phase of the STFT is less obvious, and is often not considered in applications. In most analysis/synthesis schemes that modify the STFT, the magnitude is modified, but the phase is not changed. This is a problem, as it is known, that amplitude and phase for the STFT are not independent, but instead can even carry the same information, for Gaussian windows see [Gardner and Magnasco, 2006]. So a modification of the amplitude itself, without controlling the effect on the phase, will have strange results. Therefore phase information is also pivotal for applications modifying the STFT coefficients. So, for this type of applications, in particular for applications using STFT or Gabor frame multipliers [Feichtinger and Nowak, 2003, Balazs, 2007] a better understanding of the structure of the phase is necessary to improve the processing possibilities. It is known that a multiplier has a local effect in the time-frequency plane, in the sense of a small time-frequency spread [Kozek, 1998]. But due to the uncertainty principle, it can never be perfectly localized. This contradicts the intuitive approach to a multiplier, where the multiplication would just correspond to amplification or attenuation of single time-frequency components. As a particularly interesting consequence of this phenomenon a time-frequency shift of a signal could be realized by a complex multiplier, which manipulates the phase. To control this behavior and investigate how 12

25 to exploit it, for example in an optimization of the effect of a multiplier by manipulating the phase, a thorough understanding of the effect of the phase is essential. It is known [Carmona et al., 1998] that the phase of the DFT becomes arbitrary near zeros, see [Balazs et al., 2003]. So it could be expected that the STFT shows a similar behavior. Interestingly, in this paper we observe that the behavior of the phase derivative around zeros is far from being arbitrary. The over-complete representation of the STFT and the resulting reproducing kernel property is in contrast to the basis property of the DFT. This difference, however, leads also to the afore-mentioned difference in the phase. The phase of the STFT is usually not considered directly. In fact, it is more interesting to consider the phase derivative over time or frequency. Indeed, these quantities appear naturally in the context of reassignment [Auger and Flandrin, 1995] and manipulations of the phase derivative over time is the idea behind the phase vocoder [Dolson, 1986]. Their interpretation is easier, as the derivative of phase over time can be interpreted as local instantaneous frequency while the derivative of the phase over frequency can be interpreted as a local group delay. The phase derivative over time is of particular interest for analysis of signals containing sinusoidal components, as often encountered in acoustics [Dolson, 1986]. In [Auger and Flandrin, 1995] it is shown how the local instantaneous frequency gives access to the exact frequency of a slowly changing sinusoid, despite the usual spread in time and frequency normally produced by the STFT. In numerical tests presented first in [Jaillet et al., 2009b] numerical experiments have been reported that show the peculiar behavior of the derivative of the phase at zeroes of the STFT. These experiments are included and updated in [4]. When analyzing white noise, as can be seen on Figure 2.1, the time-frequency distribution of the values appeared to be structured and in particular, the values of the phase derivative with high absolute values are concentrated around several time-frequency points, which can be identified as the zeros of the transform when looking at the modulus. Furthermore, the shape of the phase derivative seems to be very similar in the neighborhood of the zeros, with a typical pattern repeating at each zero. This typical pattern is represented on the third image of Figure 2.1. When going from low to high frequencies, it presents a negative peak followed by a positive one. In [4] the mathematical background was investigated and it was shown that for STFTs with certain regularity the mentioned phenomenon occurs. 13

2.4 Angular Frequency 2.2 2 1.8 1.6 1.4 2 1.5 1 0.5 Angular Frequency 2.4 2.2 2 1.8 1.6 1.4 200 220 240 260 280 Time 200 220 240 260 280 Time 2 1 0 1 2 2 1 0 1 0 2 1.95 1.9 1.85 Angular Frequency 1.

Bottom-left: derivative over time of the phase of the STFT using the definition (2.2.1). Bottom-right: mesh plot of the derivative over time of the phase in the neighborhood of a zero of the STFT.

26 2.4 Angular Frequency Angular Frequency Time Time Angular Frequency Time Figure 2.1: Observation for a Gaussian white noise, using a Gaussian window. Top: modulus of the STFT. Bottom-left: derivative over time of the phase of the STFT using the definition (2.2.1). Bottom-right: mesh plot of the derivative over time of the phase in the neighborhood of a zero of the STFT. Theorem (Phase derivatives of the STFT) Let f, g L 2 (R). Assume that V (f, g) = V = U + i W C 2 (R 2, R 2 ) V (x 0, ω 0 ) = 0 det J V (x 0, ω 0 ) 0, where U x (x 0, ω 0 ) U ω (x 0, ω 0 ) J V (x 0, ω 0 ) = W x (x 0, ω 0 ) W ω (x 0, ω 0 ) denotes the Jacobian matrix of V at the point (x 0, ω 0 ). Here we use the notation U x = U x, etc. Denote ψ (x, ω) = arg (V (f, g) (x, ω)). STFT satisfies { ψ lim ω ω 0 x (x, 0, ω) = +, 14 Then the phase derivative of the if ω ω 0 from below if ω ω 0 from above

27 for det J V (x 0, ω 0 ) > 0, respectively { ψ lim ω ω 0 x (x +, 0, ω) =, if ω ω 0 from below if ω ω 0 from above for det J V (x 0, ω 0 ) < 0. If V (f, g) C 3 (R 2, R 2 ), then the phase derivative of the STFT satisfies converges to some real number c R. ψ lim x x 0 x (x, ω 0) = c R, if x x 0 A similar result can be shown for the derivation in the other dimension, i.e. the frequency axis. Furthermore it was shown that for windows in the Schwartz class this differentiability conditions of the STFT are fulfilled Non-Stationary Gabor Frames [5] Gabor analysis [Feichtinger and Strohmer, 1998] is widely used for applications in signal processing. Nevertheless, when dealing with signals, with characteristics changing over the time-frequency plane, the fixed time-frequency resolution over the whole time-frequency plane can be very restrictive. This led to the use of alternative decompositions with time-frequency resolution evolving with frequency, such as the wavelet transform [Flandrin, 1999], the constant Q transform (CQT) [Brown and Puckette, 1992] or filter banks based on perception, for Frequency Time Figure 2.2: Example of sampling grid of the time-frequency plane when building a decomposition with timefrequency resolution evolving over time. example gammatone filters [Hartmann, 1998]. The standard Gabor theory was extended in [5] 3, extending ideas in [Jaillet, 2005, Jaillet et al., 2009a] to provide some freedom of evolution of the time-frequency resolution of the decomposition in either time or frequency. Furthermore, this extension is well suited for applications, because it can easily be implemented using a fast algorithm based on the fast Fourier 3 A corresponding journal paper is in preparation. 15

28 transform [Walker, 1991]. We replaced the regular time translation in standard Gabor analysis by the use of different windows. For each time position we still built atoms by regular frequency modulations: g m,n (t) = g n (t)e i2πmbnt = (M mbn g n ) (t). Assuming that the windows g n are centered at different temporal positions, the sampling of the time-frequency plane is done on a grid, which is irregular over time, but regular over frequency. Figure 2.2 shows an example of such a sampling grid. Here, as in the regular case, see e.g.[gröchenig, 2001], we found conditions, where an efficient way to calculate the canonical dual window can easily be given ( painless reconstruction ). More precisely: Theorem For every n Z, let the function g n L 2 (R) be compactly supported with supp(g n ) [c n, d n ] and let b n be chosen such that d n c n 1 b n. Then the frame operator S of the system g m,n (t) = g n (t)e i2πmbnt. m Z and n Z, is given by a multiplication operator of the form ( ) 1 Sf(s) = g n (s) 2 f(s). b n n When this condition is fulfilled, the canonical dual frame elements are given by: g n (t) g m,n (t) = nt k 1 b k g k (t) 2 ei2πmb, and the associated canonical tight frame elements can be calculated by: g (t) m,n(t) = g n (t) k 1 b k g k (t) 2 e i2πmbnt. An analog construction is possible with a sampling of the time-frequency plane irregular over frequency, but regular over time. In this case, we introduced a family of functions {h m } m Z of L 2 (R), and for m Z and n Z, we define atoms of the form: h m,n (t) = h m (t na m ). In practice each function h m will be chosen as a well localized pass-band function having a Fourier transform centered around some frequency b n. In this case the frame operator is given by: Sf = f, h m,n h m,n, m n 16

29 Frequency (Hz) Frequency (Hz) Time (s) Time (s) Frequency (Hz) Time (s) Figure 2.3: Two (stationary) spectrograms of the same glockenspiel signal obtained using two different window lengths. On the left plot, a narrow window of 6 ms is used, on the right plot, a wide window of 93 ms is used. At the bottom a spectrogram using a non-stationary Gabor transform is shown. and the problem is completely analog to the preceding up to a Fourier transform: Ŝf = f, ĥm,n ĥm,n, m n and ĥm,n = ĥm(ν)e i2πnamν. With this approach filter-banks can be implemented, in particular an invertible constant Q transform can be defined and implemented. It can be shown that wavelet frames can interpreted (and therefore implemented) in this setting Theory of Frame Multipliers State of the Art As mentioned above, frames need not only be used for analyzing functions, but can also be used for the description of operators. One particular way to define operators is the following: Let H 1 and H 2 be Hilbert spaces. Fix a m = (m k ) l (K). Then the operator defined by 4 A journal paper is in preparation. 17

30 M m,(φk ),(ψ k )(f) = k m k f, ψ k φ k (2.8) is called the Bessel multiplier for the Bessel sequences (ψ k ) and (φ k ), or frame multiplier, if the two sequences are frames. The sequence m = (m k ) is called the symbol of the multiplier. In [Schatten, 1960], such operators were investigated for orthonormal families (φ k ) and (ψ k ). This kind of operators was investigated for regular Gabor frames in [Feichtinger and Nowak, 2003]. In [Balazs, 2007] such operators were introduced for general Bessel and frame sequences. Several basic properties of frame multipliers were investigated there. In particular the implications of summability properties of the symbol for the membership of the corresponding operators in certain operator classes are specified. In particular, for Bessel sequences, symbols in l, c 00, l 2 and l 1 induce bounded, compact, trace-class and Hilbert-Schmidt operators, respectively. As a special case the multipliers for Riesz bases are examined and it is shown that multipliers in this case can be easily composed and inverted. The inverted multiplier is just the multiplier with the inverted symbol and the bi-orthogonal sequences (in switched roles), i.e. M 1 m k,(φ k ),(ψ k ) = M 1 m k,( ψ k ),( φ k ). Finally the continuous dependence of a Bessel multiplier on the parameters (i.e. the involved sequences and the symbol in use) is verified, using a special measure of similarity of sequences. Applications in acoustics traditionally use time-invariant filters. These systems can be described by the multiplication of the frequency spectrum of the signal by a fixed function, the so-called transfer function [Oppenheim and Schafer, 1999]. If the multiplication is done on the timefrequency plane, we naturally arrive at Gabor frame multipliers, which therefore are a particular choice to implement time-variant filtering, called Gabor filters [Matz and Hlawatsch, 2002] in signal-processing. In computational auditory scene analysis they are known by the name time-frequency masks [Wang and Brown, 2006] and are used to extract single sound source out of a mixture of sounds in a way linked to human auditory perception. Multipliers are interesting not only from a theoretical point of view, see e.g. [Balazs, 2008b, Dörfler and Torrésani, 2010], but also for sound morphing [Depalle et al., 2007], sound classification [Olivero et al., 2009], psychoacoustical modeling [9], see Section 2.4, or denoising in the timefrequency plane [12], see Section 2.5. This concept was extended to p-frames in Banach spaces in [6]. In [7] sufficient and necessary conditions for the unconditional convergence and invertibility of multipliers were investigated. In [8] an extensive list of examples and counter-examples for the invertibility of multipliers was collected. To shorten notation in [7] and [8] we use the following abbreviations: 18

31 R.b. - Riesz basis, fr. - frame, B. - Bessel sequence, SN - semi-normalized, -SN - norm-semi-normalized, N BB - norm-bounded below, unc. conv. - unconditionally convergent on H, INV. - invertible on H. Recall that a Riesz basis is always -SN and a Bessel sequence is always NBA Multipliers for p-bessel sequences in Banach spaces [6] One way to extend the concept of frames (and related concepts) from Hilbert spaces to Banach spaces is the following: A countable family (ψ i ) i I X is a p-frame for the Banach space X (1 < p < ) if constants A, B > 0 exist such that A f X ( i I ψ i (f) p ) 1 p B f X for all f X. It is called a p-bessel sequence with bound B if the second inequality holds. In this Banach space setting we can define multipliers in the following way: Definition Let (ψ k ) X 1 be a p-bessel sequence for X 1 with bound B 1, let (φ k ) X 2 be a q-bessel sequence for X 2 with bound B 2, let m l. The operator M m,(φk ),(ψ k ) : X 1 X 2 defined by M m,(φk ),(ψ k )(f) = k m k ψ k (f)φ k. is called an (p, q)-bessel multiplier. We obtained the following theorem which is a generalization of one of the results in [Balazs, 2007]: Theorem Let M = M m,(φk ),(ψ k ) be a (p, q)-bessel multiplier for the p-bessel sequence (ψ k ) X 1, the q-bessel sequence (φ k) X 2 with bounds B 1 and B 2. Then, the following hold. 1. If m l, M is a well defined bounded operator with M Op B 2 B 1 m. Furthermore, the sum k f X 1. m k ψ k (f)φ k converges unconditionally for all 2. M m,(φ k ),(ψ k ) = k m kψ k κ(φ k ) = M m,(ψk ),(κ(φ k )). 3. If m c 0, M is a compact operator. 19

32 Also a perturbation result could be shown, as a generalization of the results for the Hilbert space setting. On the other hand the concept of p-schatten class operators, like Hilbert- Schmidt operators, cannot be easily extended to the Banach frame case. For the definition of nuclear operators as found in [Pietsch, 1980] it is easy to show: Corollary Let (ψ k ) X 1 be a p-bessel sequence for X 1 with bound B 1, let (φ k ) X 2 be a q-bessel sequence for X 2 with bound B 2. Let r > 0 and m l r. Then M m,(φk ),(ψ k ) is a (r, p, q)-nuclear operator Unconditional Convergence and Invertibility of Multipliers [7] For a frame (φ n ) and a positive (resp. negative) semi-normalized sequence (m n ), the multiplier M (mn),(φn),(φ n) is the frame operator S (resp. S) for the frame ( m n φ n ) and thus, M (mn),(φn),(φ n) is invertible [1]. When (φ n ) and (ψ n ) are Riesz bases and (m n ) is semi-normalized, then M (mn),(φ ) n),(ψ ( n) ) is invertible and M 1 (m = M n),(φ n),(ψ n) ( 1 ),( ψ mn n),( φ n) ( φn, where and ψn denote the canonical duals of (φ n ) and (ψ n ), respectively, see [Balazs, 2007]. If ( φ d n) is a dual frame of the frame (φn ), then M (1),(φn),(φ is the identity d n) operator and therefore, invertible. If m c 0, and both (φ n ) and (ψ n ) are Bessel sequences, then the multiplier M (mn),(φn),(ψ n) is never invertible on an infinite dimensional Hilbert space, because it is a compact operator [Balazs, 2007]. In [7] we considered the question of the invertibility of multipliers M (mn),(φn),(ψ n) more closely. The involved sequences did not necessarily have to be Bessel sequences, and the symbol was not always considered to be bonded. So different cases for (φ n ) and (ψ n ) are considered - non-bessel, Bessel sequences, overcomplete frames, and Riesz bases. The unconditional convergence of multipliers was considered, in particular sufficient and/or necessary conditions were determined, which are needed for the results about invertibility. As an example, let us mention the following results: Proposition For any sequences m, Φ, and Ψ, the multiplier M m,φ,ψ is unconditionally convergent on H if and only if M m,ψ,φ is unconditionally convergent on H. For conditional convergence this result is not true any more, a counterexample is given in [7]. Sufficient and/or necessary conditions for the invertibility of M (mn),(φ n),(ψ n) were given. So, in particular, if a multiplier for two Bessel sequences and bounded symbol is invertible, the involved sequences were already frames: 20

33 Theorem Let M m,φ,ψ be invertible on H. If Ψ and Φ are Bessel sequences for H and m l, then Ψ and Φ are frames for H; mφ and mψ are also (weighted) frames for H. are deter- If the multipliers are invertible, formulas for M 1 mined. For example in the following case: (m n),(φ n),(ψ n) Proposition Let Φ = (φ k ) be a frame for H. Assume that P 1 : µ [0, A2 Φ B Φ ) such that f, m n ψ n φ n 2 µ f 2, f H. Then mψ is a frame for H, the multipliers M m,φ,ψ and M m,ψ,φ are invertible on H and 1 B Φ + µb Φ h M 1 h M 1 = k=0 [S 1 1 A Φ µb Φ h, h H, (2.9) Φ (S Φ M)] k S 1 Φ (2.10) where M denotes any one of M m,φ,ψ and M m,ψ,φ. As a consequence, if m is semi-normalized, then Ψ is also a frame for H. Several results in the spirit of the one above were shown. The sharpness of the bounds of those results as well as the independence of them were shown with examples and counter-examples. It is planned to use those in numerical algorithms in the future, improve their efficiency and apply them to acoustical applications. Finally for the case, that one of the sequences is a Riesz sequence, we give a full classification of the possibilities, when multipliers can be invertible. The results are collected in the following corollary: Corollary Let Φ be a Riesz basis for H. Then M m,φ,ψ (resp. M m,ψ,φ ) is invertible on H if and only if mψ is a Riesz basis for H. Further, the following holds. (i) If Ψ is a Riesz basis for H, then M m,φ,ψ (resp. M m,ψ,φ ) is invertible on H if and only if m is SN. (ii) If m is SN, then M m,φ,ψ (resp. M m,ψ,φ ) is invertible on H if and only if Ψ is a Riesz basis for H. (iii) If m is not SN, then M m,φ,ψ (resp. M m,ψ,φ ) can be invertible on H only in the following cases: Ψ is non-nbb and Bessel for H, which is not a frame for H, and m is NBB, but not in l ; 21

34 Ψ is non-nba, NBB, and non-bessel for H, m is non-nbb and m l ; Ψ is non-nba, non-nbb, and non-bessel for H, m is non-nbb and m / l. In the cases of invertibility, M 1 m,φ,ψ = M (1), m 1 and M Ψ, Φ m,ψ,φ = M (1), Φ, m Ψ. In the cases (i) and (ii) this corresponds to M 1 m,φ,ψ = M 1/m, Ψ, Φ and M 1 m,ψ,φ = M 1/m, Φ, Ψ Detailed characterization of conditions for the unconditional convergence and invertibility of multipliers [8] In [7] the focus was on existence results and formulas for the inversion. In [8] we presented tables and investigated the unconditional convergence and the invertibility of multipliers M m,φ,ψ. There we gave a complete set of examples varying the type of the sequences Φ = (φ n ), Ψ = (ψ n ) (non-bessel, Bessel non-frames, frames non-riesz bases, Riesz bases; norm-semi-normalized, non-norm-semi-normalized with all possible combinations) and varying the symbol m (semi-normalized, l but nonsemi-normalized, / l ). In this paper we decided to focus on the general frame level, not including coherent frames [Ali et al., 2000] like Gabor systems [Feichtinger and Strohmer, 1998], wavelet systems [Flandrin, 1999] or frames of translates [Casazza et al., 2001]. Therefore we had constructed the examples by manipulating abstract orthonormal sequences or bases. We listed all possible combinations. We gave a full classification, if multipliers under those conditions can be ( POSSIBLE ), have to be ( ALWAYS ) or never can be ( NOT POSSIBLE ) unconditionally convergent and invertible (resp. non-invertible) on the given Hilbert space. Please note that examples for multipliers that are identical to the identity give examples for those cases, where sequences can be dual to each other. We only considered sequences with non-zero elements, as in this case, for example, the invertible identity operator and the zero operator can be described as multiplier, if zeros are put at appropriate places (see [7] for details.) To shorten notation, for ν = (ν n ), Θ = (θ n ), Ξ = (ξ n ), we will write M m,φ,ψ = M ν,ξ,θ if there exist scalar sequences (c n ), (d n ) so that ξ n = c n φ n, θ n = d n ψ n and m n = ν n c n d n for every n. This means that in the series of the multipliers the summands are the same element-wise. As an example for the results in [8] we present one table with the connected examples, where we only consider unconditionally convergent multipliers. For this we need the following results: Lemma Let G k denote the multiplier M ( 1 nk ),(en),(en), k N. Then G k is unconditionally convergent on H and not invertible on H. 22

35 Proposition Let Φ be a NBB Bessel for H, which is not a frame for H. Then, for any Ψ and any m, the multiplier M m,φ,ψ (resp. M m,ψ,φ ) can not be both unconditionally convergent on H and invertible on H. Example Let Φ = (e 2, e 3, e 4, e 5,...). (i) Let m = (1, 1, 1, 1,...). Then M m,φ,φ is clearly unconditionally convergent on H and not surjective. (ii) Let m = ( 1 2, 1 3, 1 4, 1 5,...). Then M m,φ,φ is clearly unconditionally convergent on H and not surjective. Example Let Φ = (e 2, e 3, e 4, e 5,...) and Ψ = ( 1 2 e 2, 1 3 e 3, 1 4 e 4, 1 5 e 5,...). (i) Let m = (1, 1, 1, 1,...). Then M m,φ,ψ and M m,ψ,φ are clearly unconditionally convergent on H and not surjective. (ii) Let m = ( 1 2, 1 3, 1 4, 1 5,...). Then M m,φ,ψ and M m,ψ,φ are clearly unconditionally convergent on H and not surjective. (iii) Let m = (2, 3, 4, 5,...). Then M m,φ,ψ and M m,ψ,φ are clearly unconditionally convergent on H and not surjective. Example Let Φ = ( 1 n e n). (i) Let m = (1). Then M m,φ,φ = M ( 1 n 2 ),(en),(en) = G 2 - unconditionally convergent and non-invertible on H (see Lemma 2.3.7). (ii) Let m = ( 1 n ). Then M m,φ,φ = M ( 1 n 3 ),(en),(en) = G 3 - unconditionally convergent and non-invertible on H (see Lemma 2.3.7). (iii) Let m = (n 2 ). Then M m,φ,φ = M (1),(en),(en) = I. (iv) Let m = (n). Then M m,φ,φ = M ( 1 n ),(en),(en) = G 1 - unconditionally convergent and non-invertible on H (see Lemma 2.3.7). 23

36 Table 3: two Bessel sequences which are not frames φ - B. ψ - B. m - SN m l, but non-sn m / l not fr. not fr. Mm,Φ,Ψ, Mm,Ψ,Φ Mm,Φ,Ψ, Mm,Ψ,Φ Mm,Φ,Ψ, Mm,Ψ,Φ Mm,Φ,Ψ, Mm,Ψ,Φ Mm,Φ,Ψ, Mm,Ψ,Φ Mm,Φ,Ψ, Mm,Ψ,Φ INV. NON-INV. INV. NON-INV. INV. NON-INV. -SN -SN NOT POSSIBLE ALWAYS NOT POSSIBLE ALWAYS NOT POSSIBLE NOT POSSIBLE see Prop apply Prop see Prop apply Prop Mm,Φ,Ψ, Mm,Ψ,Φ - not unc. conv., Example 2.3.1(i) Example 2.3.1(ii) see [7] -SN non-nbb NOT POSSIBLE ALWAYS NOT POSSIBLE ALWAYS NOT POSSIBLE POSSIBLE see Prop apply Prop see Prop apply Prop see Prop Example 2.3.2(iii) Example 2.3.2(i) Example 2.3.2(ii) non-nbb non-nbb NOT POSSIBLE ALWAYS NOT POSSIBLE ALWAYS POSSIBLE POSSIBLE see Prop apply Prop see Prop apply Prop Example 2.3.3(iii) Example 2.3.3(iv) Example 2.3.3(i) Example 2.3.3(ii) 24

37 2.4 Applications in Acoustics: Time-Frequency Sparsity by Perceptual Irrelevance State of the art An interesting area of acoustics research is the field of human auditory perception. It is known in psychoacoustics [Zwicker and Fastl, 1990] that not all time-frequency components of a real-world acoustic signal can be perceived by the human auditory system. More precisely, it turns out that some time-frequency components mask other components, which are close in the time-frequency domain. Masking refers to the process where the threshold of audibility for one sound (the target) is raised by the presence of another sound (the masker). Masking can render the masked sound inaudible. Masking occurs in two main signal configurations; simultaneous occurrence of target and masker is referred to as simultaneous, frequency or spectral masking [Moore, 1989]; non-simultaneous occurrence of target and masker is referred to as temporal masking, see e.g. [Fastl, 1976]. To investigate spectral masking, the frequency separation between target and masker is varied. In the most common method of masking patterns, the masker frequency is fixed and the amount of masking is measured for various target frequencies. To investigate temporal masking, the frequencies of masker and target are identical and the temporal separation between masker and target is varied, resulting in the temporal masking function. Backward masking (the target temporally precedes the masker) is weaker than forward masking. The amounts of backward and forward masking depend on the masker duration. Because of the specific demands in the simultaneous and non-simultaneous masking experiments reported in the literature, the experimental stimuli were almost always broad either in the temporal domain (e.g., long-lasting sinusoids), the frequency domain (e.g., clicks), or both. Real-world sounds are broadband signals and therefore involve mutual masking effects between the individual narrowband components into which the signal can be separated. This raises the question how the masking effects of more than one simultaneous masker on a target add up. To a first approximation, the masked thresholds elicited by two individual maskers have to be added linearly in the power domain to derive the combined masked threshold [Moore, 1985]. For two equally effective maskers this means, in the logarithmic scale, that the masked threshold in the presence of both maskers is 3 db higher than that for one masker alone. This rule may apply if side effects are ruled out, such as the detection of combination products, the detection of the target at a tonotopic place aside from the target frequency (so-called off-frequency listening), or detection during the minima 25

38 in the temporal envelope of the masker (dip-listening). In many configurations, however, particularly if the maskers do not overlap at the auditory filter centered at the target, the additivity of masking can be larger than according to the linear addition rule [Humes et al., 1992]. Furthermore, little is known about the additivity of masking for more than two maskers [9]. Another effect complicating the prediction of masking effects for realworld sounds is that the auditory system integrates signal information across frequencies to detect a signal. As an example, the masked threshold for two simultaneously presented sinusoids equally contributing to detection is about 2.5 db lower than the masked thresholds for each sinusoid alone. This implies that two (or more) spectral components of a complex signal may be audible even if each of them separately is below the masked threshold. In addition, the maximum bandwidth up to which spectral integration is efficient depends on the signal duration [van den Brink and Houtgast, 1990]. Furthermore, mutual suppression effects between individual spectral components of a sound may reduce the effective masking effect evoked by those components [Humes and Jesteadt, 1989]. A well-known technique to reduce the digital size of an audio file, the MP3 audio codec [Brandenburg, 1999], uses a model of human auditory perception for compression. This and similar perceptual audio codecs, allocate low bit rates to frequency channels which are subject to masking effects and thus have little or no perceptual relevance. This technique is very efficient in reducing the capacities required for transmitting and storing audio files. Contrary to this coding approach, which results in additional quantization noise, an irrelevance filter detects those timefrequency components, whose removal causes no audible difference to the original signal. Deleting such masked and thus perceptually irrelevant components makes the signal representation more sparse. An irrelevance filter based on a Gabor frame multiplier approach was presented in [9]. This filter only considers a simple simultaneous masking model and still many more irrelevant components could be extracted. It will be extended using new psychoacoustical data using time-frequency masking data for well-concentrated atoms and models as in [10]. The development of such an extension of the irrelevance algorithm based on a non-stationary Gabor transform adapted to an auditory filterbank is currently in development Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking [9] A heuristic irrelevance algorithm developed by Eckel and Deutsch [Eckel, 1989] existed at the Acoustics Research Institute for years. To pro- 26

39 vide the mathematical and signal processing background for this started our whole research towards Gabor and frame multipliers. This was the goal of the algorithm presented in [9], referred to as the irrelevance filter. Its goal is to remove those time-frequency components in a standard Gabor transform, whose removal causes no audible difference to the original signal after resynthesis. Note the difference to perceptual audio codecs; they use a low bit depth and thus introduce quantization noise in frequency bands, where the signal falls below the masked threshold. In contrast, in the proposed model we want to either keep a component or remove it if irrelevant. Thus, we attempt to introduce silence in bands, where the signal falls below the irrelevance threshold. In other words, the algorithm searches for a time-frequency representation, which is sparser but perceptually equivalent to the original representation after resynthesis. The parameters of the algorithm were chosen to be suitable for most every-day sounds, i.e. real-world music and speech signals, and no calibration of the audio system should be necessary, so it should work on most reasonable setting for a standard PC. The proposed algorithm uses a simple model of simultaneous masking (also referred to as spectral masking) which is based on data from the psychoacoustic literature. As mentioned above, a basic model for the simultaneous masking effect, referred to as the excitation pattern model of masking [Moore and Glasberg, 1983], is that the auditory system can detect a target presented simultaneously with a masker only if the excitation pattern of target plus masker significantly differs from that of the masker alone. If the two excitation patterns do not differ in a way detectable by the auditory system the target cannot be perceived, it is masked [van der Heijden and Kohlrausch, 1994]. This basic model allows for the prediction of the masked threshold of a target signal in the presence of a masker signal, with certain constraints upon the stimuli. The masked threshold is defined as the minimum level of the target at which it is audible in the presence of the masker. A convolution kernel is defined by F (x) = l u 2 x l + u 2 e + x 2 (2.11) shifted to the point (0, 0). Here the parameters are the lower frequency slope l and the upper slope u and the non-negative parameter e that allows to control the smoothness of the function at point zero. Our method estimates the excitation pattern by applying the kernel for l = 27, u = 24 and e = 0.3 on single frequency components (in the bark frequency scale). It assumes linear additivity of mutual effects (in the power domain). In this way, by transforming a spectrum into the bark scale, this estimation can be found by using convolution of the spectrum with the kernel. The masked threshold function was shifted in level by a certain amount corresponding to the results of a perceptual experiment and all components 27

40 falling below the shifted function (the irrelevance threshold) are removed. At the level shift determined the subjects could not discriminate the irrelevance filtered signal from the original signal. Using this approach allows to cope with the uncontrolled effects associated with the removal of spectral components, resulting from the over-completeness of the used Gabor frame. Furthermore, it allows to cope with inaccuracies of the masking model itself. The masking model chosen for the current algorithm is not considering the nonlinearities and complex interactions involved in auditory masking for real-world sounds, mentioned above. The irrelevance filter algorithm is implemented as a time-varying, adaptive filter. The irrelevance threshold function is calculated for each consecutive spectrum of a running signal using the mentioned simple simultaneous masking model. Only the components exceeding the threshold are included in the re-synthesis stage. This step is equivalent to multiplying each time-frequency point by 0 or 1. Fig. 2.4 shows the perceptually relevant TF components. This procedure is an example of an adaptive Gabor filter with a symbol consisting of zeros and ones. The underspread property [Kozek, 1998] is important, since the induced time-frequency shift should be as local as possible. The approximation process, in which only single time-frequency points are removed from the signal, was performed as accurately as possible. The goal was to obtain an operator with good timefrequency localization, i.e. an underspread operator. To achieve that goal and following Gabor theory, a high redundancy of 8 was chosen. At the high redundancy short on/off cycles of single components that are close to the irrelevance threshold are smoothed out, which is desirable from a psychoacoustical point of view as sharp on/off edges cause audible artifacts. For the high redundancy and the chosen Hamming window the resulting frame is snug, i.e. nearly tight, which allowed to use the original window also as a synthesis window, with nearly no numerical error. Together with the chosen simple masking model, which could be implemented as convolution, this resulted in an efficient algorithm Additivity of nonsimultaneous masking for short Gaussian-shaped sinusoids [10] For an extension of the irrelevance algorithm using a multiplier based on a Gabor or wavelet transform, data about the time-frequency masking spread for well-concentrated atoms, like Gaussian-windowed sinusoids as well as the additivity of these masking effects are essential. Auditory masking has been extensively studied for non-simultaneous (temporal masking) and simultaneous (spectral masking) presentation of masker and target. Because of the specific demands in the nonsimultaneous and simultaneous masking experiments, the experimental stimuli were almost always broad either in the temporal domain, the frequency domain, 28

Spectogram of bach frequency sample k 110 90 70 50 30 10 20 40 60 80 100 120 140 Symbol for Irrelevance method frequency sample k 110 90 70 50 30 10 20 40 60 80 100 120 140 Amplitude of relevant

Bach), high amplitudes are displayed darkly, low ones brightly ; MIDDLE: The symbol of the Gabor filter for the irrelevance filter, black = 1, white = 0.

or both. Quite little is known about nonsimultaneous and simultaneous masking effects for masker and target signals that are well-concentrated in both the time and frequency domains.

41 Spectogram of bach frequency sample k Symbol for Irrelevance method frequency sample k Amplitude of relevant components frequency sample k time sample l Figure 2.4: TOP: The spectrogram of test signal bach (classical music by J. S. Bach), high amplitudes are displayed darkly, low ones brightly ; MIDDLE: The symbol of the Gabor filter for the irrelevance filter, black = 1, white = 0. BOTTOM: The result of the point-wise multiplication of these two sets of coefficients, representing the amplitude of relevant components. To get back to the signal domain, re-synthesis is applied. or both. Quite little is known about nonsimultaneous and simultaneous masking effects for masker and target signals that are well-concentrated in both the time and frequency domains. Such well-concentrated stimuli can be more flexibly arranged in time-frequency space compared to temporally or spectrally broad stimuli. Thus, they are well-suited for studying masking effects with various time-frequency relations between masker and target stimuli. Compared to maskers that are broad in at least one domain, wellconcentrated maskers may produce different masking effects. In [10] we were concerned with the additivity of masking for multiple well-concentrated maskers that are separated in time. For that extensive psychoacoustical tests were performed. This, among other psychoacoustically measured data, will be used in a future extension of the irrelevance algorithm. In particular the adaptation of the Gabor multiplier symbol should consider the additivity of the masking effect of several Gabor atoms. 29

42 The averaged data for the additivity of four translated Gaussianwindowed sinusoids with the same frequency can be seen in Figure 2.5. Figure 2.5: The open symbols show the averaged of the experimental results for five listeners. Different masker combinations are indicated with symbols shown in the legend. Error bars indicate 95% confidence intervals. The dotted lines indicate the predictions from linear masking additivity. The other two lines in each panel show the predictions of the model of masking additivity proposed in [C. J. Plack and Drga, 2006], using I/O functions best fitting their mean data obtained with long maskers (dashed) and I/O functions best fitting the data of the present study (solid). As mentioned in a submitted paper 5 it can be shown that the current models for including time-frequency masking effect are based on a nonappropriate model for the combination of temporal and spectral masking. In current work we develop an irrelevance model based on the new psycoacoustical data and a non-stationary Gabor transform [5]. The parameters will again be tested in psychoacoustical tests. When an effective irrelevance model is found, this knowledge can be used to improve perceptual coders. 5 Necciari, T.; Savel, S.; Laback, B.; Meunier, S.; Balazs, P.; Kronland-Martinet, R. and Ystad, S. Time-frequency spread of auditory masking for spectrally and temporally maximally-compact stimuli 30

2.5 Applications in Acoustics: Acoustic System Estimation 2.5.1 State of the art In many acoustical applications the parameters of a system, like the coefficients of an LTI filter, have to be estimated from a recorded signal corrupted by noise.

6: (LEFT) The HRTF measurement assembly at ARI. (RIGHT) In-ear microphone.

43 2.5 Applications in Acoustics: Acoustic System Estimation State of the art In many acoustical applications the parameters of a system, like the coefficients of an LTI filter, have to be estimated from a recorded signal corrupted by noise. In the special case of the estimation of head related transfer functions (HRTF), the use of exponential sweeps (ESs) as input signals has a lot of advantages. Figure 2.6: (LEFT) The HRTF measurement assembly at ARI. (RIGHT) In-ear microphone. HRTFs describe the sound transmission from the free field to a place in the ear canal in terms of LTI systems and are crucial for sound localization in virtual environments. Measurements of HRTFs (see Figure 2.6) can be considered a system identification of the weakly non-linear electro-acoustic chain sound-source - room - HRTF - microphone, and can be done with ESs. Those input signals show many promising properties, among them the separation of linear and nonlinear parts of weakly nonlinear systems [Müller and Massarani, 2001]. The optimization by the multiple exponential sweep method (MESM), was developed in [11] for the measurement of HRTFs with a substantially reduced amount of time. In [12] a method for denoising the measured responses using a Gabor frame multiplier was developed. In current development we analyze the statistics of the amplitude of colored noise in an overcomplete non-stationary Gabor analysis. As for the analysis of an ES a nonstationary Gabor transform based on a CQT seem optimal, we will base 31

44 the future denoising algorithm on such an analysis method Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions [11] A head-related transfer function (HRTF) describes the sound transmission from the free field to a place in the ear canal in terms of a linear timeinvariant system [Møller et al., 1995]. HRTFs contain spectral and temporal cues which vary according to the sound direction. A set of HRTFs measured for different positions can be used to create virtual free-field stimuli. Measuring of individual HRTF sets for each subject is necessary for most studies on localization in virtual environments. Results of several studies imply that to get an accurate spatial resolution the number of HRTFs in a set should exceed 1000 positions for the upper hemisphere. For the measurement of HRTFs the signal is presented from a given position via a loudspeaker, using a digital-analog converter (DAC) and a power amplifier. Acoustic waves propagate in the measurement room and are altered by the torso, head, and pinna of a subject. Microphones are placed in the ear canals of a subject and capture the arriving sound. Many issues have to be considered when choosing a system identification method for acoustic systems. When the measurements are performed in noisy rooms, the background noise reduces the signal-to-noise ratio (SNR) of the measurement. Also, the equipment, especially the power amplifier, adds noise to the excitation signal. In addition to the SNR two further issues should be considered: nonlinear distortions and time variability. Presenting signals via loudspeakers yields in nonlinear distortions, mostly due to the saturation effects of the loudspeaker membrane and the nonlinearity of the gain characteristics of the power amplifier. Furthermore, due to the fact that the subject s head position may move during the identification process, the HRTF measurement must be considered as an identification of weakly time-variant systems. Several system identification methods were taken into consideration. The system identification with exponential sweeps (ES) showed some promising properties [Müller and Massarani, 2001] like separation of linear and nonlinear parts of weakly nonlinear systems, a high SNR (and therefore robustness to noise) and fast processing using the FFT. Thus, system identification with ES is a very good method for HRTF measurements. The proposed method, MESM, uses an appropriate overlapping and interleaving of the excitation signals, see Figure 2.7. Interleaving: Consider the effect of applying a sweep to a weakly nonlinear system and, after a small delay, applying the same sweep to a second system. Recording the summed response signal and applying the deconvolution process will result in a signal where the harmonic impulse responses 32

45 (HIR), the result of the weakly nonlinear system, of the two systems are interleaved in time. The interleaving mechanism results from delaying the excitation of the second system in such a way that its IR is placed between the IR and the second-order HIR of the first system. Therefore the measurement of the linear IRs is not disturebed. This can be best analyzed and designed in the time-frequency plane. Overlapping: In the most simple and straight forward method for the system identification of multiple systems, it is logical to play a single sweep, wait for its end, wait the length of the reverberation time, and then play the next sweep. However, in systems with a small number of harmonics, which is the case for weakly non-linear systems, it is not necessary to wait for the end of the previous sweep. As long as the highest harmonic of the next sweep response does not interfere with the reverberation caused by the previous sweep the sweeps may overlap in time. A combination of these two approaches form MESM. Note that analyzing the time-frequency representation of the method gives a good estimation of what delays can be used. See Figure 2.7. An optimization of the chosen parameters was reached by an analytical solution of the involved parameter equations. Figure 2.7: Spectrogram of the recorded signal as an example of a system identification using MESM (two overlapping groups of two interleaving sweeps) Note that MESM is not restricted to HRTF measurements, it can be used for the estimation of any system. For the estimation of any slowly 33

46 time-varying, weakly non-linear systems it will be very advantageous. For the special case of HRTF measurements, it could be shown that the measurement duration could be reduced by a factor of four, which is in particular of interest, as human subjects are involved. This method is connected to many projects at the Acoustics Research Institute, e.g. for [Goupell et al., 2010, Kreuzer et al., 2009, Majdak et al., 2011, Marelli et al., 2008], but it is also used by other scientists, see e.g. [Rébillat et al., 2011, Enzner, 2009, Farina, 2009] A Time-Frequency Method for Increasing the Signal- To-Noise Ratio in System Identification with Exponential Sweeps [12] Exponential sweeps (ESs), as mentioned above, are used in the field of audio engineering to measure impulse responses (IRs) of acoustic and electroacoustic systems. Such measurements are usually contaminated by the environmental noise. Even though environmental noise is often modeled as an independent and identically distributed (i.i.d) process, most environmental noise sources have a non-flat spectral characteristics (colored noise). In this study, we propose a method that improves the SNR when systems with frequency-dependent response decay are measured under colored-noise conditions. In acoustics, most denoising methods have been developed for speech or music. Many of these methods rely on the Wiener solution [Wiener, 1949] which represents a mean-square-error (MSE) optimum for stationary signals assuming a contamination with an i.i.d process. For colored noise, spectral subtraction is used where the spectral noise signature is subtracted from the recorded signal. Those methods modify the signal in each time window independently which leads to artifacts like speech distortions or musical noise [Vary, 1985]. A sweep-based method for improving the SNR has been proposed in [Xia, 1997]. Even though this method shows promising results, it is limited to very short IRs, relies on the frequency-independent variance of the noise, and does not incorporate the properties of system-identification with ESs. In contrast to [Xia, 1997], in our method, we use the a priori knowledge about the TF characteristic of ESs and the fact that the system response is decaying with time. Further, by using frame theory, we approach perfect reconstruction of clean signals and avoid artifacts like musical noise. The method does not rely on any assumption of the noise but stationarity. It is able to handle any arbitrary broadband delay in the recorded signal. In our method, we represent the recorded response to the ES in the TF plane and classify parts of that plane as either environmental noise or deterministic 34

47 signal with the goal to obtain a connected region defined as the signal region. In contrast to most speech-denoising methods, our method uses hard thresholding: the parts considered as signal are not modified and the parts considered as noise are removed. The classification in either signal or noise is done for each frequency band independently and thus does not rely on any assumption of the spectral characteristic of the noise. By applying a Gabor frame multiplier corresponding to the signal region, we obtained a denoised version of the recorded signal which is used to estimate the IR of the measured system. Ideally, this method provides both accurate identification of the measured system and noise reduction. The method is evaluated by comparing the SNR in the noisy and denoised IRs. In particular we choose a Gaussian Gabor window and its canonical dual for reconstruction. To find the multiplier symbol, we have to define the signal region, whose coefficients are kept, and the other coefficients are erased, i.e. multiplied by zero. The start of the signal region is set to the analytical time-frequency known position of the input sweep, called sparse TF representation of the sweep in Figure 2.8 (all involved filters are assumed to be causal). The end is found by analyzing each frequency band. An investigation on a part, where it is known that only noise exists, lead to an estimation of the statistical properties of the noise. When the smoothed frequency answer is below a certain threshold derived from the mean value and standard deviation of the noise, this is considered as the end of the signal region. The resulting symbol, called the sparse mask in Figure 2.8, is broadened by convolving it with the absolute value of the Gram matrix to take the reproducing property into account and guarantee not to lose any significant coefficients for the signal. The proposed method improves the SNR in the impulse response measured with exponential sweeps, as seen in Figure 2.8 and other simulation results. In the low-snr conditions, the SNR improves compared to direct measurement and/or block-thresholding. In the high-snr conditions, the method does not fail, i.e., it does not introduce artifacts. Assuming stationary noise, decaying system response, and an exponential sweep as the excitation signal, our method shows promising results in denoising measurements of electro-acoustic systems. However, our method seems still to be far from the optimal solution. For example, the noise estimator does not use any statistical mode. Exploiting an appropriate statistical model for the amplitude of the overcomplete TF representation of the noise can improve the estimation. Also, the separation acuity is low in the low-frequency region and may be improved using the non-stationary Gabor transform [16] in terms of a constant-q transform. 35

Figure 2.8: TF representations of the recorded signal (a) and the exponential sweep (b). Sparse TF representation of the sweep (c). Sparse (d) and broadened mask (e) containing the signal region.

48 Figure 2.8: TF representations of the recorded signal (a) and the exponential sweep (b). Sparse TF representation of the sweep (c). Sparse (d) and broadened mask (e) containing the signal region. TF representation of the output of the Gabor multiplier (f). Spectral representation of the differences in the noisy (g, black) and the denoised (g, colored) IRs relative to the clean IR. Note the smaller errors in the denoised condition especially for the higher frequencies. 36

Mathematics in Acoustics

Mathematics in Acoustics Peter Balazs Acoustics Research Institute (ARI) Austrian Academy of Sciences Peter Balazs (ARI) Mathematics in Acoustics 1 / 21 Overview: 1 Applied Mathematics Peter Balazs (ARI)