Audio. Wavelets. Neural Nets

Size: px
Start display at page:

Download "Audio. Wavelets. Neural Nets"

Transcription

1 Technische Universitaet Berlin Institut fuer Telekommunikations systeme Audio Content Description with Wavelets and Neural Nets Diploma Thesis November 23 Stephan Rein (94434) Fachgebiet Nachrichten uebertragung Prof. Dr.-Ing. Thomas Sikora Prof. Dr. Martin Reisslein a Dr. Nicolas Moreau s(n) φ low-pass hi-pass ψ w w R Approximations Details input p p 2 dist X f rdb output a downsampling A ca D cd signal coeff. low-pass hi-pass p R b q A 2 ca2 D 2 cd 2 4 SoivMen57 x 7 2th Iteration Daubechies 2 Pa3 So3 Pa2 So Pa So So Pa So2 Pa2 So3 Pa3 Men57 Hei52 Mil75 5 envelope Gauss fit time [seconds] x 6 a Prof. Dr. Martin Reisslein is with Dept. of Electrical Engineering, Arizona State University.

2 Abstract We examine MPEG-7 audio tools and Fourier techniques for precision audio content description. We find that the MPEG-7 dyadic scaling procedure and the short-term Fourier transform are less suitable for description of highly complex audio content when generalization properties are required. We develop a novel wavelet envelope descriptor with good generalization properties for audio content description and a methodology for a statistical, descriptive analysis of wavelet data for derivation of elementary content description tools. We examine the usability of a combination of 39 different wavelets and three different types of neural nets for precision audio content description. We obtain promising results for a combination of specific wavelets with a probabilistic radial net. The proposed methodology is designed for an identification service for classical movements (musical parts of a composition) to be realized by next generation internet search machines. The calculated content description data can be efficiently computed and generalizes the audio content. Single audio compositions are identified even if they are very similar to each other and significantly different to the identification system s example sets. The radial net is trained with vectors from 96 example pieces and allows retrieval of 32 novel classical audio movements. The system already obtains a success rate of 78 % when trained by only three independent example sets. A training procedure that usually processes a large number of independent example sets is not necessary. Also, a similarity vector containing labels of similar pieces is computed as a possible answer to a user-query. For our methodology we employ a novel wavelet dispersion measure that measures obtained ranks of wavelet coefficients. This measure is able to efficiently describe and summarize highly complex wavelet patterns and therefore is an addition to current signal communication techniques.

3 CONTENTS I Introduction 2 I-A Related Work II The Audio Data Base 4 III MPEG-7 Audio Content Descriptors 5 III-A Summary IV Survey on Wavelets IV-A Wavelets for Audio Content Description V Content Description with Wavelets 6 V-A Gaussian Wavelet Envelope Descriptor V-B Statistical Wavelet Analysis for Content Description V-B. Statistical Data Summarization Tools V-B.2 Scale Frequency Measure V-B.3 Percentile Correlations V-C Summary VI A novel Wavelet Dispersion Measure 24 VI-A Wavelet Dispersion Classifier Matrix VI-B Wavelet Dispersion Measure Dimension Reduction VI-C Wavelet Dispersion Measure Performance Indicator VII Neural Nets for Audio Classification 32 VII-A Perceptron Neural Networks VII-B Multilayer feedforward Backpropagation Neural Networks VII-C Probabilistic Radial Basis Neural Network VIII Performance Analysis 36 VIII-A Wavelet Performance VIII-B Similarity Matrix VIII-C Summary IX Conclusion 43 Appendix 43 A Deutsche Zusammenfassung B Definition of Statistical Summarization Tools C Percentile Plots D MPEG-7 Audio D. Basic D.2 Audio Power D D.3 Basic Spectral D.4 Spectral Basis D.5 Signal Parameters D.6 Timbre Descriptors E Audio Data Base F Matlab code for calculation of a Weighting Matrix G Matlab code for Key Results Wavelet Dispersion Measure H Matlab code for a Probabilistic Neural Net

4 2 References 59 I. INTRODUCTION Due to the immense and growing amount of world-wide available audiovisual data the development of a technique allowing for content retrieval and classification has become a challenging task. The description of multimedia content is the key to the improvement and acceleration of various current technologies and also will allow interdependencies between these technologies to realize completely novel applications, as shown in Figure. content description combine different techniques next generation applications content description voice recognition artificial intelligence multimedia content retrieval content description content description I need detailed information on wavelets. Your schedule allows for study of multimedia online tutorial An Introduction to Wavelets, by A. Graps Fig. CONTENT DESCRIPTION IS THE KEY TO THE IMPROVEMENT OF VARIOUS TECHNOLOGIES AND THUS CAN REALIZE COMPLETELY NOVEL APPLICATIONS, SUCH AS INTELLIGENT AND INTERACTIVE PERSONAL DIGITAL ASSISTANTS. Next generation internet search machines will be able to understand and process multimedia content. More precisely, a user query can be a mixture of multimedia data including text, voice, picture and video content. The search machine will give a reasonable answer providing content that is highly related to the query and of important relevance for the user. In this thesis we present a novel audio retrieval methodology that is readily applicable for next generation internet search machines, see Figure 2. The proposed technique provides good generalization abilities as it allows for identification of audio data that is not part of the example set of the search system. The proposed methodology can be enhanced for description of multidimensional content. Audio Content description for implementation in internet search machines requires the content description methodology to comply with certain requirements. In the ideal case, the proposed methodology keeps the concept of current internet search algorithms allowing internet multimedia retrieval by a software update. The requirements for an applicable methodology are given as follows: Due to the immense amount of world wide audio data the descriptors must have a very compact representation. The methodology must provide an efficient computation scheme for construction of these descriptors. An efficient mapping procedure for the descriptors is necessary to allow an user-oriented search and retrieval service. The methodology must be readily applicable. Clearly, there must be no constraints that only allow the descriptors to perform well under certain circumstances. The descriptors shall work for various kinds of world wide available audio data.

5 3 Theatre education class: unkown background music in student s movie audio content query: sending a small extract of 4 seconds Internet server with content descriptors J.S. Bach, Sonata No., Part IV, recorded 957 by Y. Menuhin INTERNET identify composition find similar compositions Fig. 2 A NOVEL WAVELET DISPERSION MEASURE ALLOWS A METHODOLOGY FOR EFFICIENT AUDIO CONTENT RETRIEVAL TO BE INTEGRATED IN NEXT GENERATION INTERNET SEARCH MACHINES. This thesis is organized as follows. In Section I-A a survey on related work is given. There exists a large array of audio content description literature. However, to our best knowledge a methodology for identification of highly complex musical audio recordings that are not part of the search system s data base has not yet been proposed. In Section II we present the audio recordings we have employed for the derivation and evaluation of our methodology. Precision descriptors must be able to identify and categorize audio compositions within a musical genre. This is a complex challenge, as the audio data to be categorized into different classes is very similar. In Section III we study MPEG-7 audio description tools that are mainly based on Fourier techniques. We calculate Fourier coefficients and discuss the standardized diadic scaling procedure. We find that these tools are not designed for high-precision classification. In Section IV we compare the Fourier- with the wavelet analysis. Both techniques underlay the same concept, however, wavelets are designed to describe very irregular and nonstationary signals and therefore are predestined for audio data content description. In Section V, we examine wavelet techniques for audio content description. We find that in the wavelet domain, very specific patterns represent the audio content. The wavelet coefficients are measures for similarity to mother wavelet functions. These similarities describe special audio content features that allow for construction of descriptors that are able to generalize. Such a generalization is essential for identification of audio compositions that are unknown to the classification system. The methodology proposed in this thesis is the result of a variety of analytic calculations and conducted experiments. We develop a novel analytic wavelet envelope descriptor and a methodology for a statistical analysis of wavelet data have been developed. Although these tools are not included in our finally proposed methodology, Section V reports the corresponding findings, because they led to a novel statistical dispersion measure and might be useful for further investigations. In Section VI we propose a novel wavelet dispersion measure. We find that this measure is suitable to efficiently describe the wavelet patterns discovered in Section V. We conduct some experiments that indicate that this wavelet dispersion data can be performed by a neural net thus realizing a computationally effective mapping and classification technique. We expect this measure to be also useful for improvement of other signal communication techniques, including speech recognition, as it extracts specific features of the time domain that allow for content identification and generalization. In Section VII, we give a short tutorial on three different types of neural nets, which are employed in our performance analysis. We explain why neural nets can be very useful for audio classification problems. In Section VIII we examine the performance of our wavelet dispersion measure employing different wavelet families and different wavelet s. We measure the success rate of the dispersion measure when combined with three different types of neural nets. The finally proposed methodology achieves a mean success rate of 78% for 32 highly complex audio pieces from a recording that is not in the search system s data base. For each of the 32 very similar pieces, the identification system employs an example set of three pieces from different recordings. The identification success rate for audio files known to the system is approximatively %. In Section IX, we

6 4 TABLE I SONATAS AND PARTITAS FOR THE SOLO VIOLIN, COMPOSED BY J.S. BACH AROUND 72. THE SIX PIECES ARE PRESENTED IN PAIRS, ALTERNATELY SONATA-PARTITA. ESPECIALLY THE SONATAS HAVE A VERY SIMILAR STRUCTURE, THUS RESULTING IN A SPECIAL CHALLENGE FOR AUDIO CONTENT CLASSIFIERS. OUR PRECISION CLASSIFIERS ARE EVALUATED FOR DISTINCTION BETWEEN THESE 32 HIGHLY COMPLEX MOVEMENTS. Sonata No. g-moll Sonata No. 2 a-moll Sonata No. 3 C-dur I Adagio I Grave I Adagio II Fugue II Fugue II Fugue: Alla breve III Siciliano III Andante III Largo IV Presto IV Allegro IV Allegro assai Partita No. h-moll Partita No. 2 d-moll Partita No. 3 E-dur I Allemande I Allemande I Preludio II Double II Courante II Loure III Courante III Sarabande III Gavotte en Rondeau IV Double IV Gigue IV Menuet I V Sarabande V Chaconne V Menuet II VI Double VI Boure VII Bourre VII Gigue VIII Double summarize our findings and outline further investigations. A. Related Work There exists a large body of literature on audio content description, sound classification, and audio retrieval. This literature includes audio fingerprinting systems for identification of audio songs known to the search system s data base, see for instance [] [2] [3] [4] [5]. Our system differs from these works in that it identifies unknown complex audio with a high success rate. The existing body of literature also includes retrieval systems for the categorization of different sounds. Generally, the system is trained by a number of example sounds for classification of novel sound segments into content based classes, see for instance [6] [7] [8] [9] []. MPEG-7 tools and Mel- frequency Cepstrum coefficients are combined with hidden Markov models in [] to label sports audio data by one of 6 sound classes. In [2] [3], support vector machines and line methods are employed to classify audio sounds into 6 sound classes. In [4] wavelet data is employed to classify data files containing speech, music, and sounds. There exist systems for artist detection [5] and music type detection [6] [7] [8]. Our system differs from these classification systems in that it identifies movements from different performances (differing among other things in background noise) of the same highly complex classical composition. Furthermore, it employs a novel wavelet summarization measure. To the best of our best knowledge, this is the first work to propose a methodology for identifying highly complex musical audio recordings that are not part of the search system s data base. II. THE AUDIO DATA BASE There exists a huge, unmanageable body of music recordings and a categorization in terms of user relevance and importance is not possible. There exist reference audio data bases for different sounds, including sounds of birds, telephone, or laughter (see Other audio data bases are constructed from popular charts. In this thesis we employ a specific data base as we want to classify audio pieces within one genre. We have chosen six pieces composed by Johann Sebastian Bach, the six Sonatas and Partitas for the Solo Violin, Bachwerkeverzeichnis (BWV) -6, as shown in Table I. Appendix E provides a detailed description of the employed recordings, including time duration and labels. We summarize the requirements that are fulfilled by this data base as follows: ) The audio data base must be of consistent relevance. This requirement is not fulfilled by the frequently employed charts, because they are not stable. Clearly, classification techniques derived using a chart of 23

7 5 TABLE II WE CONSIDER FOUR DIFFERENT PLAYERS FOR THE PERFORMANCE OF THE SONATAS AND PARTITAS THUS RESULTING INTO 28 AUDIO FILES WITH A TOTAL LENGTH OF APPROXIMATIVELY HOURS. THE RECORDINGS REPRESENT DIFFERENT LEVELS OF QUALITIES. THE RECORDING TECHNIQUE OF 934 DID NOT ALLOW FOR CORRECTION OF PERFORMANCE ERRORS. player year studio location Yehudi Menuhin Studio Albert Paris EMI records Yehudi Menuhin 957 Abbey Road Studios London EMI records Jascha Heifetz 952 RCA Studios Hollywood Bertelsmann Music Group Nathan Milstein 973 Cornway Hall London Polydor International GmbH might not work reasonably with a chart of 22. The Sonatas and Partitas of Bach have been composed around 72, they are a standard literature for the violin, which is the most frequently employed instrument in classical music. Bach s music is current today and still will be current in years from now. 2) For interpretable results an available manuscript describing the musical compositions is useful. A basic study of the audio material allows for a temporal as well as for a frequency coverage of the musical events. Due to our findings, even when conducting statistical experiments, a relation to single events within the composition is comprehensible and can be exploited for derivation of novel techniques. Furthermore, a manuscript ensures that there are comparable performances available. Our database is recorded by four different players, as shown in Figure 3. Such performances allow for construction of descriptors with good generalization properties. 3) The considered audio files shall represent different levels of qualities. In terms of next generation internet search machines, audio information of various quality has to be processed. Our four chosen recordings include a wide range of today available audio qualities. Yehudi Menuhin made the first complete recording of the Sonatas and Partitas. This recording represents the recording studio technique of 934. Despite the audio quality constraints, this historical recording is special because technical performance errors could not be corrected. A studio performance had to be recorded without any breaks and rerecordings. We consider the recording of Nathan Milstein as up to date audio quality. Even if this performance was recorded in 973 using analog techniques, there is a not measurable difference to digital state of the art recordings in terms of audio content information. The digital technique is of relevance for consumer oriented lossless audio reproduction. For our experiments we use a digitally remastered studio copy that reproduces the original sound-image of the recorded performance. 4) The considered compositions shall reveal polyphonic and not separable phenomena. Bach s Sonatas and Partitas demand the player to concurrently use different cords. Clearly, although there is only one solo violin, a sound comparable to the performance of many violin players is present. This is a special challenge to the compactness of the descriptors. Although there are multiple voices, a dimension reduction for example by sub-space estimation technique is not possible, because the different voices do not fulfill the statistical requirements for this technique. Furthermore, such a separation is difficult because the different voices do not occur at fixed frequency bands. For extraction of audio content description features, the recordings were down sampled to 8 khz using the software cooledit 2.3 (see III. MPEG-7 AUDIO CONTENT DESCRIPTORS The MPEG-7 (Moving Pictures Experts Group) standard is an ISO/IEC standard and describes a variety of content description tools. The standard makes content retrieval applications possible and compatible in such a way that content queries of professional but also normal users are efficiently answered. To obtain a broad generality the standard does not standardize or evaluate content retrieval applications. In this section we study the suitability of the standardized MPEG-7 audio classifiers for precision classification. MPEG-7 audio provides ) a platform for the description data, 2) low-level tools, and

8 6 Fig. 3 FOUR PERFORMANCES RECORDED BY THREE STARS OF THE LAST CENTURY: YEHUDI MENUHIN, JASCHA HEIFETZ, AND NATHAN MILSTEIN. THE RECORDING OF YEHUDI MENUHIN MADE AROUND 935 WAS THE FIRST COMPLETE RECORDING OF BACH S SONATAS AND PARTITAS FOR THE SOLO VIOLIN. MORE THAN TWENTY YEARS LATER, Y. MENUHIN RERECORDED THE PARTITAS AND SONATAS. OUR METHODOLOGY ALLOWS FOR IDENTIFICATION OF THE AUDIO COMPOSITION AND THE PART WITHIN THE AUDIO COMPOSITION. IF THE RECORD IS KNOWN TO THE SYSTEM THE PLAYER AND THE DATE WHEN THE PERFORMANCE WAS RECORDED IS GIVEN. 3) high-level tools. The platform for the description data is an interface allowing compatibility between the different applications that are built on MPEG-7 audio descriptors. Clearly, this interface describes a set of standardized data containers, which store the data provided by the audio content descriptors. Precision classifier data could be stored in such containers, thus allowing interoperability between different applications. The high-level tools combine the technique of the low-level tools to allow a variety of high-level applications. They are designed to allow audio signature description, musical instrument timbre description, melody description, general sound and indexing description, and spoken content description. For reasonable usage of these description tools the low-level descriptors must provide meaningful features of the audio signal. Therefore for precision classification, a study of the low-level tools is essential. The low-level tools include techniques to describe time and frequency domain features of audio signals. We have thoroughly studied and considered these tools for a possible inclusion in our methodology. In Appendix D, the formulas for the low-level MPEG-7 descriptors are detailed. We now only shortly discuss the functionality of all these descriptors and concentrate on the MPEG-7 elementary description tools. Table V shows MPEG-7 Basic and Basic Spectral descriptors. The Basic descriptors are useful for displaying audio signals. The Basic Spectral descriptors provide elementary spectral features of audio signals. Table VI shows MPEG-7 dimension reduction tools. Generally, for audio content descriptors, a compact representation of the descriptive data is useful. This technique is able to separate different voices of musical instruments. Table VII summarizes the MPEG-7 timbre descriptors. These descriptors allow for distinction between different tonal components. For example, different sounds of instruments can be classified. In addition, a simple silence detection tool is provided by MPEG-7. Many of the descriptors detailed here already have been evaluated in the related literature. They allow for classification of different elementary sounds. The MPEG-7 dimension reduction technique is of secondary importance for precision classifiers that generally aim to identify audio pieces by analysis of only very small extracts or segments. The MPEG-7 silence detector is not useful for our consistently loud audio pieces. Among all these MPEG-7 audio descriptors we consider the Basic Spectral descriptors most relevant for precision content description. Especially the Audio Spectrum Envelope and the Audio Spectrum Flatness descriptors allow for extraction of data to describe tonal components. The Audio Spectrum Envelope descriptor is of special relevance as it provides the Fourier coefficients to be processed by almost all other MPEG-7 descriptors. Furthermore, for the Audio Spectrum Envelope a specific dyadic scaling procedure is specified, which is also used for the Audio Spectrum Flatness descriptor. Therefore, we now examine the Audio Spectrum Envelope

9 7 TABLE III EDGE FREQUENCIES IN [HZ] FOR LOGARITHMIC BANDS FOR m = 6,..., 8 AND AN RESOLUTION OF r = descriptor. a) Audio Spectrum Envelope: The Audio Spectrum Envelope descriptor employs a short time Fourier transform with overlapping Hamming windows. Let lw denote the length of the analysis window in samples. The position of each window is described by a shift h, which is the number of samples the Hamming window has to be slided over the audio file to obtain the next analysis window position. Let s(n) denote the Hamming windowed audio signal and N denote the fast Fourier transform size, which is chosen due to applicability of fast Fourier techniques to the next larger power of 2 from lw. As a consequence, the analysis window is enlarged by zero padding, thus resulting a larger number of Fourier coefficients. This process is a pseudo-enlargement of the frequency resolution. The Fourier coefficients X w (k) are calculated as follows: N X w (k) = s(n) e j2π(k )(n )/N, k N. () n= Each coefficient belongs to one of N frequencies. Only the half of these frequencies are retained due to the symmetry of the Fourier transform. The frequency distance DF between two adjacent frequencies is given as DF = sr N (2) where sr denotes the sampling rate. This is a standard procedure to obtain the Fourier coefficients for an analysis frame. The MPEG-7 standard now specifies a grouping of these coefficients to obtain a logarithmic frequency axis. This frequency axis is considered due to the logarithmic frequency properties of the human ear. To obtain such a frequency axis, logarithmic frequency bands are defined. The edge frequencies of these bands are defined as f edge = 2 rm KHz, m Z, (3) where m is the resolution within octaves. If m = 4 there are 4 edge frequencies per octave. Table III shows the calculated edge frequencies for m = 6,..., 8 and a resolution of r = /4. The 25 edge frequencies result in 24 bands. Each band is represented by a mean value calculated from the Fourier coefficients that refer to this band. The frequencies 62.5 Hz and 4 Hz are denoted as loedge and hiedge. Two additional values have to be calculated for the out of band energy for,..., loedge and hiedge,..., sr/2, where sr/2 represents the Nyquist frequency. Importantly, for the calculation of a value that represents a logarithmic band an assignment rule has to be followed: Fourier coefficients with frequencies further away than DF/2 from a band edge have to be shared between two bands in such a way that each band retains a part of the coefficient. A linear weighting function estimates these parts. This procedure is explained by Figure 4. In fact, a logarithmic frequency band contains Fourier coefficients from loedge DF/2 to hiedge+df/2, however, they have to be partially weighted using the weighting function shown in Figure 4. For computational effective realization of such a method, we propose the construction of a weighting matrix. The appropriate matlab code is given in Appendix F. We first calculate the short term fourier coefficients. The values of these coefficients are retained in a matrix C with N rows, the number of Fourier frequencies, and F columns, the number of analysis frames. Each vector of such a matrix contains the fourier coefficients of one analysis frame. With an appropriate weighting matrix W, the matrix D containing L logarithmic band values per column is given as D = (W C) W, (4) where W denotes a matrix containing the number of fourier coefficients that are considered for each logarithmic band value. In Equation 4, denotes a matrix product, whereas denotes an element by element product. The

10 8 value of weighting function weighting function left band edge width of logarithmic frequency band right band edge DF/2 DF/2 DF/2 linear frequency axis Fig. 4 THE FOURIER COEFFICIENTS HAVE TO BE WEIGHTED FOR CALCULATION OF A LOGARITHMIC BAND VALUE. THEREBY THE FOURIER COEFFICIENTS ARE SCALED FROM A LINEAR AXIS TO A LOGARITHMIC AXIS. THIS PROCEDURE IS SPECIFIED IN MPEG-7 DUE TO THE LOGARITHMIC SCALING PROPERTIES OF THE HUMAN EAR. TABLE IV EXAMPLE OF A WEIGHTING MATRIX WITH THE FIRST 2 COLUMNS AND THE FIRST 8 ROWS. EACH ROW SELECTS FOURIER COEFFICIENTS FOR A LOGARITHMIC BAND VALUE. VALUES LARGER THAN AND SMALLER THAN INDICATE THAT FOURIER COEFFICIENTS ARE SHARED BETWEEN ADJACENT LOGARITHMIC BANDS. WE PROPOSE SUCH A MATRIX FOR EFFECTIVE CALCULATION OF THE LOGARITHMIC BAND VALUES matrix W calculates mean values from the summed Fourier coefficients and is constructed from the sum of the rows of W. The resulting vector is F times repeated to construct a L-by-F matrix W. A column vector of D then contains the logarithmic band values that belong to an analysis frame. Each value of such a column is the result of a scalar product between a row of the weighting matrix and a column of the Fourier matrix D. Therefore each row of the weighting matrix has to select the appropriate values of a column of the Fourier matrix to construct a logarithmic band value. The weighting matrix must contain as many rows as there are logarithmic bands and as many columns as there are Fourier frequencies. Table IV shows an example of a weighting matrix. This matrix, and especially the matrix W, allow for inspection of the number of Fourier coefficients that each logarithmic band contains. Generally, this methodology to obtain a logarithmic is very sensitive to the choice of the logarithmic edge frequencies. Clearly, even when using the MPEG-7 default value for the lowest logarithmic edge frequency and frequency resolution, it can happen that a logarithmic band remains empty thus resulting in not suitable content description data. Furthermore, the number of coefficients per band increases exponentially. Therefore, the lower bands contain a significantly smaller number of coefficients than the higher bands. In our example, the first two bands contain less than one coefficient. We expect such a system of unequally filled bands to react sensitively to aliasing errors. Figure 5 shows the logarithmic band values of our example when using the MPEG-7 default values for the lowest band edge frequency (62.5 Hz) and frequency resolution (4 frequencies per octave). A fine structure is only visible on half of the range of logarithmic bands. Clearly, such a scaling procedure may distort audio content information when processing highly complex audio data.

11 9 TABLE V MPEG-7 BASIC AND BASIC SPECTRAL DESCRIPTORS PROVIDE A BASIC TIME DOMAIN ANALYSIS AND A BASIC FREQUENCY DOMAIN ANALYSIS. Basic Audio Waveform Audio Power Basic Spectral Audio Spectrum Envelope Audio Spectrum Centroid Audio Spectrum Spread Audio Spectrum Flatness minimum and maximum amplitude value within an audio frame temporally smoothed instantaneous power short time Fourier transform coefficients, search and comparison center of gravity of log frequency power spectrum, shape of the power spectrum, indicates dominance of either high or low frequencies in the spectrum, measure of perceptual timbre second moment of Audio Spectrum Centroid, dispersion of power spectrum, sound distinction tone/noise deviation from flat spectral shape for frequency bands, can indicate tonal components TABLE VI MPEG-7 SPECTRAL BASIS AND SIGNAL PARAMETERS DESCRIPTORS. THE SPECTRAL BASIS DESCRIPTORS USE A SINGULAR VALUE DECOMPOSITION TO RETAIN ONLY STATISTICALLY RELEVANT FEATURES. THE SIGNAL PARAMETERS DESCRIBE THE SIGNAL S PERIODICITY. Spectral Basis Audio Spectrum Basis Audio Spectrum Projection Signal Parameters Audio Fundamental Frequency Audio Harmonicity statistical basis functions to reduce the dimension of spectrum data uses Audio Spectrum Basis for low-dimension representation of the spectrum describes signal s fundamental frequency spectrum s harmonicity, distinction between different sounds TABLE VII MPEG-7 TIMBRE DESCRIPTORS. THE TIMBRE DESCRIPTORS DESCRIBE MUSICAL AND PERCEPTUAL TIMBRE OR TONE QUALITY INDEPENDENT OF LOUDNESS AND PITCH. Temporal Timbre Log Attack Time Temporal Centroid Spectral Timbre Harmonic Spectral Centroid Harmonic Spectral Deviation Harmonic Spectral Spread Harmonic Spectral Variation temporal characteristics of segments, single value for tone quality signal s time to rise from silence to maximum amplitude locate focus of signal s energy, distinction decaying/sustained tones spectral features in linear frequency space, perception of musical timbre power-weighted average of the frequency of the bins in the linear power spectrum, sharpness of a sound amplitude-weighted mean of the spectrum s harmonic peaks, refers only to tone s harmonic parts normalized amplitude weighted standard deviation of the harmonic peaks normalized correlation between harmonic peak s amplitude of two adjacent frames

12 4 Pa3ivMen36, shift=6, hamm=3* dyadic scaling, Pa3ivMen frequency [Hz] band index time [seconds] time [seconds] Fig. 5 SHORT TIME FOURIER TRANSFORM (LEFT PLOT) AND THE CORRESPONDING DYADIC BAND VALUES (RIGHT PLOT). THE COLORMAPS SHOW DB-VALUES. THE FINE STRUCTURE OF THE FOURIER COEFFICIENTS IS NO LONGER VISIBLE ESPECIALLY DUE TO THE FACT THAT THE LOWER LOGARITHMIC BANDS ARE NOT SUFFICIENTLY FILLED. FOR A DYADIC SCALE, A WAVELET TRANSFORM IS MUCH MORE SUITABLE. A. Summary In this section we have studied MPEG-7 Audio content descriptors. Among all these descriptors we consider the Audio Spectrum Envelope and the Audio Spectrum Flatness descriptor to be of possible relevance for precision classification, because they aim to describe tonal structures. For both descriptors a weighting method to obtain a logarithmic frequency is specified. We find that this procedure is very sensitive to the parametrization. In our example using the default values for the edge frequencies and octave frequency resolution, the lower logarithmic bands are not reasonably filled, thus resulting in a very raw representation of the audio content. We expect such a representation to be not suitable to precisely describe highly complex audio signals. For scaling properties and a smaller set of parameters, wavelets are much more suitable, as demonstrated in Section IV. In the next section we give a survey on wavelets and discuss their possible relevance for precision descriptors. IV. SURVEY ON WAVELETS In this section we give a survey on wavelets for audio content description, see [9] [2] [25] for a more general introduction to wavelets. As wavelets are highly related to the Fourier analysis we first have a look at Fourier techniques. We further try to answer general questions on content description with both techniques. The Fourier technique is extensively employed in MPEG-7. We explain why we prefer to employ wavelets for content description of highly complex audio. The Fourier analysis allows to represent every periodic function by the sum of sine and cosine functions, given as f(x) = a + [a k cos(kx) + b k sin(kx)], (5) where the Fourier coefficients are given by a = 2π 2π f(x)dx, a k = π k= 2π f(x)cos(kx)dx, b k = π 2π f(x)sin(kx)dx. (6) We now want to analyse the first three notes of movement V (Chiaconne) in Bach s Partita No. 2. The notation

13 of these notes is illustrated in Figure 6a. An analytic representation s(t) of these notes can be given as A sin(w t) + A 2 sin(w 2 t) + A 3 sin(w 3 t) t < t t s(t) = t < t 2 A 4 sin(w t) + A 5 sin(w 2 t) + A 6 sin(w 3 t) t 2 t < t 3. (7) A 7 sin(w t) + A 8 sin(w 4 t) + A 9 sin(w 5 t) + A sin(w 6 t) t 3 t < t 4 With Table VIII the frequencies w,..., w 5 shown in Table IX can be calculated. To obtain a temporal accordance to Figure 6a we choose t =.475 seconds, t 2 =.5 seconds, t 3 = 2 seconds, and t 4 = 3 seconds. As Figure 6a does not describe the loudness of the single notes, we chose A,..., A 6 =. Figure 7a shows a Fourier analysis of this signal. The frequencies are clearly resolved, however, the single notes are not precisely resolved in time. For this reason the short term Fourier transform has been proposed. This technique performs a Fourier transform for only small segments of the time signal. To reduce discontinuities at the edges of these segments, generally a window function is used, which is slided over the entire time signal. Figure 8a shows such a short term Fourier transform with Hanning windows that overlap by half of their size. The frequencies are still reasonably resolved but the time resolution is not satisfactory due to the large window size of 3 milliseconds. Therefore we reduce the window size to 8 milliseconds as shown in Figure 8b. Now the time resolution of the individual events is very good, but the different frequencies are no longer resolved. With Fourier techniques the choice of the window size remains a compromise between a reasonable frequency or time resolution. Overall, this compromise between frequency and time should not exclude the powerful Fourier technique to be considered for precision descriptors. We could develop a parametrization technique that constructs two content description vectors to reasonably resolve either time or frequency. We therefore now analyse a real performance of Bach s Chiaconne using Fourier techniques. Figure 7b shows the measured frequencies as a result of a Fourier transform on Yehudi Menuhin s performance of 934. We measure frequencies that are less precisely resolved than in Figure 7a. In fact, a variety of frequencies are measured that are only partially shown in Figure 7b due to our restricted plot bandwidth of Hz. These additional frequencies are called overtones and harmonics. Harmonics are integer multiples of the fundamental frequency. Overtones are any resonant frequency over the fundamental frequency. Overtones can be harmonics. These tones are responsible for the sound timbre and are very important for audio content description. As shown in Figure 9a, the short term Fourier transform has even more difficulties in resolving these frequencies. Figure 9b shows that also the choice of a very small Hanning window does not allow for a precise time resolution. These plots indicate that Fourier techniques are less suitable for precision content description of highly complex audio signals due to several reasons: ) Musical sounds have each a specific timbre that results in highly complex, less regular signals. The spectrum shows not bounded, smooth structures that make an appropriate parametrization either to time or frequency resolution difficult. 2) Figure 9 b indicates that the performance of Y. Menuhin concerning tone pitch and temporal resolution is less in accordance to the tones noted in Bach s manuscript, which are shown in Figure 6a. This is due to Y. Menuhin s individual interpretation of the Chiaconne, which is roughly detailed in Figure 6b. Such individual interpretations are general phenomena in musical performances and make a time- and frequency oriented generalization extremely difficult. 3) Our analytic low complexity signal assumed equal amplitudes for each sinusoid. In reality each tone has a different loudness. Thus a reasonable frequency estimation is more difficult. The Fourier coefficients describe similarities of the audio signal to sinusoids, but these similarities do not describe the very specific features we are looking for. Therefore we now consider the technique of wavelets. Wavelets solve the time frequency resolution problem of the Fourier technique. A. Wavelets for Audio Content Description A wavelet transform is highly related to a Fourier transform. A Fourier transform decomposes a signal into a sum of weighted sinusoids. The weights are called Fourier coefficients. A wavelet transform decomposes a signal into a weighted sum of wavelet functions. The weights are called wavelet coefficients. The Fourier coefficients are calculated by the operation of convolution. A convolution ( ) can be interpreted as a correlation ( ): s(t) ψ(t) = s(t) ψ( t) (8)

14 q a Fig. 6 q FIRST THREE NOTES OF BACH S FAMOUS CHIACONNA. THE MANUSCRIPT (FIG. 6A) ALLOWS FOR A VERIFICATION OF THE MEASURED FREQUENCIES. FOR PRACTICAL REASONS THESE NOTES GENERALLY ARE SLIGHTLY DIFFERENTLY PLAYED. FIG. 6 B ROUGHLY DESCRIBES THE INTERPRETATION OF Y. MENUHIN, WHERE THE CHORDS ARE BROKEN. SUCH INDIVIDUAL INTERPRETATIONS EACH RESULTING IN RECORDINGS THAT DIFFER IN TIME AND FREQUENCY ARE GENERAL PHENOMENA IN MUSICAL PERFORMANCES AND MAKE A GENERALIZATION OF AUDIO CONTENT EXTREMELY DIFFICULT. b 2 TABLE VIII INTERVALS AND FREQUENCY RATIOS EXCLUDING DIMINISHED AND AUGMENTED INTERVALS. RATIOS WITH SMALL NUMBERS PRODUCE A CONSONANT SOUND, WHEREAS SECONDS AND SEVENTHS PRODUCE A DISSONANT SOUND. interval half frequ. interval half frequ. steps ratio steps ratio unison : perfect fifth 7 3:2 minor second 6:5 minor sixth 8 8:5 major second 2 9:8 major sixth 9 5:3 minor third 3 6:5 minor seventh 9:5 major third 4 5:4 major seventh 5:8 perfect fourth 5 4:3 perfect octave 2 2: TABLE IX FREQUENCIES IN [HZ] OF THE FIRST NOTES OF THE CHIACONNA CALCULATED FROM FIG. 6. t < t t 3 t < t 4 t 2 t < t 3 e f 6 = 66 Hz a f 3 =44 Hz b f 5 = Hz f f 2 =352 Hz g f 4 = 39. Hz d f = Hz d f = Hz ψ( t) represents the flipped function ψ(t). Thus the Fourier coefficients can be interpreted as similarity measures to sinus functions. Similar, the wavelet coefficients are measures of similarity to wavelet functions. A wavelet coefficient is calculated for a s and a position p. The s describes how the mother wavelet function is d. It can either be dilated or compressed. The position p describes a shift of the wavelet function. Thus, the wavelet coefficients are calculated as C(s, p) = s(t) ( ) t p ψ dt. (9) s s Whereas a short term Fourier transform refers to a time-frequency signal representation, the wavelet transform refers to a time- signal representation, as illustrated in Figure. Thus the Fourier resolution problem, namely the choice of a window size towards time or frequency does not exist for a wavelet transform. When performing a wavelet decomposition, the s-d mother wavelet function is slided along the entire signal s(t). For each shift p a wavelet coefficient is calculated. This procedure is repeated for each. The higher the the more dilated is the mother function. Similarly, the lower the the more compressed is the mother function. Therefore, a high refers to a low frequency, whereas a low refers to a high frequency. A wavelet transform that only uses s and shifts of powers of two is called a dyadic wavelet transform. An important property of wavelets are their vanishing moments: t j ψ(t)dt =, j =,..., k. () R

15 3 real X(k) real X(k) frequency [Hz] a Fig frequency [Hz] MEASURED FREQUENCIES WITH THE FOURIER TECHNIQUE. FIG. 7A EXACTLY SHOWS THE FREQUENCIES WE HAVE CALCULATED FROM BACH S MANUSCRIPT. HOWEVER, IT IS BUILT FROM A MATHEMATICAL SIGNAL THAT CANNOT BE GENERATED BY ANY VIOLIN. FIG. 7B SHOWS THE FOURIER ANALYSIS OF THE PERFORMANCE OF Y. MENUHIN. A VARIETY OF OVERTONES ARE MEASURED THAT ARE RESPONSIBLE FOR THE SOUND TIMBRE. b This feature allows for suppression of the polynomials s(t) = k j= a j t j. All these polynomials have zero wavelet coefficients. The number of vanishing moments is called the wavelet s order number. Two examples of a wavelet function, illustrated in Figure, are the Mexican Hat and the Morlet wavelet, which are defined as and ψ mexh = ( 2 3 π /4 ) ( x 2 )e x2 /2 () ψ morl = C e x2 /2 cos(5x), (2) where C is a normalization constant. These two wavelets are exceptions, because generally wavelets do not have an analytical function. The wavelet s shape is given by its corresponding filter coefficients. The filter coefficients refer to decomposition filters that allow for a discrete wavelet transform. Thereby a signal is decomposed into details and approximations, as illustrated in Figure 2. The details refer to the signal s high frequency components as they are calculated using a high-pass filter. They indicate similarities to the wavelet mother function ψ. For many wavelets there exists an additional function that is very similar to the wavelet mother function: The scaling function φ is related to the approximations, which refer to the signal s low frequency components. Thus, a wavelet decomposition can be performed using high- and low-pass filters that give wavelet approximation and detail coefficients. The shape of the wavelet function can be approximated by upsampling and convolving the high-pass filter, see Figure 3. Similarly, the shape of the scaling function can be approximated by upsampling and convolving the low-pass filter. It is possible to reconstruct the original signal by reconstruction filters. A set of low- and high-pass decomposition and reconstruction filters is called a system of quadrature-mirror filters. The filters have to be designed in such a way that aliasing effects are minimized. The main drawback of the Fourier technique is the fixed size of the analysis window. When analysing frequencies and using a large window, the frequencies cannot be sufficiently resolved in time. Using a small window results into a fine time resolution, however, low frequency components can no longer be measured. This drawback is solved by the wavelet analysis. The varying wavelet allows for analysis of low frequency components at a fine frequency resolution and for analysis of high frequencies with a fine time resolution. One reason why wavelets still did not displace the Fourier technique especially in the area of content description may be the less comprehensible interpretation of the different s. As previously noted, the s refer to high and low frequencies. However, we note that compared to a Fourier transform, a wavelet decomposition is

16 4 frequency [Hz] time [seconds] frequency [Hz] time [seconds] a b Fig. 8 SHORT TERM TIME LOCALIZED FREQUENCY TRANSFORM. THE LEFT PLOT REASONABLY RESOLVES THE FREQUENCIES. THE RIGHT PLOT ALLOWS FOR A CORRECT TIME RESOLUTION OF THE EVENTS. A SHORT TERM FOURIER TRANSFORM ALWAYS IS A COMPROMISE BETWEEN TIME AND FREQUENCY. frequency [Hz] time [seconds] frequency [Hz] time [seconds] a b Fig. 9 SHORT TERM FOURIER TRANSFORM OF THE FIRST THREE NOTES OF BACH S CHIACONNA PERFORMED BY Y. MENUHIN. THE SAME PARAMETERS AS IN FIGURE 8 HAVE BEEN CHOSEN FOR A REASONABLE TIME RESOLUTION (LEFT PLOT) AND A REASONABLE FREQUENCY RESOLUTION (RIGHT PLOT). THE SHARP AND BOUNDED STRUCTURES OF FIGURE 8 ARE NO LONGER VISIBLE DUE TO THE SOUND TIMBRE. FURTHERMORE THE PERFORMANCE OF MENUHIN AS DETAILED IN FIGURE 6 B RESULTS IN A MORE COMPLEX REPRESENTATION OF THE AUDIO CONTENT IN THE FREQUENCY DOMAIN.

17 5 higher low frequency convolution lower high frequency time (position) Fig. A WAVELET TRANSFORM IS A TIME-SCALE REPRESENTATION OF THE SIGNAL. THE DILATED OR COMPRESSED WAVELET MOTHER FUNCTION (DIFFERENT SCALES) IS SLIDED OVER THE ENTIRE SIGNAL (DIFFERENT POSITIONS). A WAVELET COEFFICIENT C(s, p) IS A MEASURE OF SIMILARITY BETWEEN THE SIGNAL AND THE WAVELET FUNCTION FOR A SCALE s AND A POSITION p..5 Mexican Hat Morlet Fig. MEXICAN HAT AND MORLET WAVELET FUNCTION. WAVELET FUNCTIONS GENERALLY DECREASE QUICKLY TOWARDS. not a straight-forward frequency estimation technique. In fact, the wavelet coefficients indicate similarities to wavelet functions, which do not have a frequency, because they are not periodic functions. The relation between and frequency exists due to a possible assignment of a pseudo-frequency to a wavelet function. These pseudo-frequencies are estimated to describe the shape of a d wavelet for a restricted time as closely as possible. Furthermore, an analysis of concurrent frequencies with wavelets is difficult. Generally, for the analysis of frequencies of stationary signals the Fourier technique is preferable. In this thesis we are looking for an extraction technique that allows for the representation of specific features of the audio signals. Our data base contains highly complex, non-stationary and less regular audio signals. From an intuitive point of view it makes more sense to describe these signals using more complex and less regular wavelets functions than periodic sinus functions. Wavelets can reveal very small discontinuities that cannot be described by sinoids. Figure 4 indicates that the wavelet coefficients in fact describe very specific details of the audio signal. The single events each are resolved by very sharp and bounded patterns. A compact description for these patterns would allow for a verification of the here noted assumptions. From now on we consider a non-dyadic wavelet transform for our methodology. Recall that a dyadic wavelet transform employs powers of two for the shifts and s. A dyadic wavelet transform results in a more space

18 6 s(n) φ low-pass hi-pass ψ Approximations Details downsampling A ca D cd signal coeff. low-pass hi-pass A 2 ca2 D 2 cd 2 Fig. 2 WAVELET DECOMPOSITION TREE. A SIGNAL CAN BE DECOMPOSED INTO APPROXIMATIONS AND DETAILS: s = A + D = A 2 + D 2 + D 4th Iteration Daubechies 2 x th Iteration 4 Daubechies Fig. 3 SHAPE OF THE WIDELY USED DAUBECHIES 2 WAVELET. A WAVELET SHAPE CAN BE APPROXIMATED BY UPSAMPLING AND CONVOLUTION OF THE LOW-PASS RECONSTRUCTION FILTER COEFFICIENTS. saving representation of the content, however, the extracted features are less readable and of lower precision. For our derivation of a technique for precision classifiers we initially need all the details that can be resolved by the wavelet technique. V. CONTENT DESCRIPTION WITH WAVELETS As detailed in Chapter IV we want to use wavelet coefficients for the description of audio content. The wavelet coefficients precisely describe the audio content, however, we only want to retain a very compact representation that shall allow for efficient search and retrieval. The feature extraction technique has to solve an extremely demanding problem: On one hand, a very precise content information has to be extracted because we want to derive precision descriptors. These descriptors allow for classification of audio data even if very similar content is described. On the other hand, the extracted data should allow for a generalization. Clearly, the data must not describe the content too precisely, because then it will not be possible to identify a recording that is not part of the example set of the system. Figure 5 roughly describes the scenario we consider in this chapter. We consider movement iv of Sonata No. recorded by Y. Menuhin and N. Milstein. The recording of N. Milstein represents

Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets q

Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets q Information Sciences 176 (26) 1629 1655 www.elsevier.com/locate/ins Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets q Stephan Rein a,1,

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

6.003: Signals and Systems. Sampling and Quantization

6.003: Signals and Systems. Sampling and Quantization 6.003: Signals and Systems Sampling and Quantization December 1, 2009 Last Time: Sampling and Reconstruction Uniform sampling (sampling interval T ): x[n] = x(nt ) t n Impulse reconstruction: x p (t) =

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015 University of Colorado at Boulder ECEN 4/5532 Lab 2 Lab report due on February 16, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Topic 6. Timbre Representations

Topic 6. Timbre Representations Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?

More information

Problem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity

Problem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity Problem with Fourier Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Fourier analysis -- breaks down a signal into constituent sinusoids of

More information

Wavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG.

Wavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Problem with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power

More information

Extraction of Individual Tracks from Polyphonic Music

Extraction of Individual Tracks from Polyphonic Music Extraction of Individual Tracks from Polyphonic Music Nick Starr May 29, 2009 Abstract In this paper, I attempt the isolation of individual musical tracks from polyphonic tracks based on certain criteria,

More information

Introduction Basic Audio Feature Extraction

Introduction Basic Audio Feature Extraction Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017 Today g Main modules A. Sound and music for

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

Identification and Classification of High Impedance Faults using Wavelet Multiresolution Analysis

Identification and Classification of High Impedance Faults using Wavelet Multiresolution Analysis 92 NATIONAL POWER SYSTEMS CONFERENCE, NPSC 2002 Identification Classification of High Impedance Faults using Wavelet Multiresolution Analysis D. Cha N. K. Kishore A. K. Sinha Abstract: This paper presents

More information

Physical Acoustics. Hearing is the result of a complex interaction of physics, physiology, perception and cognition.

Physical Acoustics. Hearing is the result of a complex interaction of physics, physiology, perception and cognition. Physical Acoustics Hearing, auditory perception, or audition is the ability to perceive sound by detecting vibrations, changes in the pressure of the surrounding medium through time, through an organ such

More information

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ). Wavelet Transform Andreas Wichert Department of Informatics INESC-ID / IST - University of Lisboa Portugal andreas.wichert@tecnico.ulisboa.pt September 3, 0 Short Term Fourier Transform Signals whose frequency

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK

USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Volume 17 HYDROACOUSTICS USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Polish Naval Academy Smidowicza 69, 81-103 Gdynia, Poland a.zak@amw.gdynia.pl

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Signal Processing COS 323

Signal Processing COS 323 Signal Processing COS 323 Digital Signals D: functions of space or time e.g., sound 2D: often functions of 2 spatial dimensions e.g. images 3D: functions of 3 spatial dimensions CAT, MRI scans or 2 space,

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Elec4621 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis

Elec4621 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis Elec461 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis Dr. D. S. Taubman May 3, 011 In this last chapter of your notes, we are interested in the problem of nding the instantaneous

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

Discrete Wavelet Transform

Discrete Wavelet Transform Discrete Wavelet Transform [11] Kartik Mehra July 2017 Math 190s Duke University "1 Introduction Wavelets break signals up and then analyse them separately with a resolution that is matched with scale.

More information

Chirp Transform for FFT

Chirp Transform for FFT Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

Topic 3: Fourier Series (FS)

Topic 3: Fourier Series (FS) ELEC264: Signals And Systems Topic 3: Fourier Series (FS) o o o o Introduction to frequency analysis of signals CT FS Fourier series of CT periodic signals Signal Symmetry and CT Fourier Series Properties

More information

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256 General Models for Compression / Decompression -they apply to symbols data, text, and to image but not video 1. Simplest model (Lossless ( encoding without prediction) (server) Signal Encode Transmit (client)

More information

Wavelet Transform in Speech Segmentation

Wavelet Transform in Speech Segmentation Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 05 Image Processing Basics 13/02/04 http://www.ee.unlv.edu/~b1morris/ecg782/

More information

INTRODUCTION TO. Adapted from CS474/674 Prof. George Bebis Department of Computer Science & Engineering University of Nevada (UNR)

INTRODUCTION TO. Adapted from CS474/674 Prof. George Bebis Department of Computer Science & Engineering University of Nevada (UNR) INTRODUCTION TO WAVELETS Adapted from CS474/674 Prof. George Bebis Department of Computer Science & Engineering University of Nevada (UNR) CRITICISM OF FOURIER SPECTRUM It gives us the spectrum of the

More information

Index. p, lip, 78 8 function, 107 v, 7-8 w, 7-8 i,7-8 sine, 43 Bo,94-96

Index. p, lip, 78 8 function, 107 v, 7-8 w, 7-8 i,7-8 sine, 43 Bo,94-96 p, lip, 78 8 function, 107 v, 7-8 w, 7-8 i,7-8 sine, 43 Bo,94-96 B 1,94-96 M,94-96 B oro!' 94-96 BIro!' 94-96 I/r, 79 2D linear system, 56 2D FFT, 119 2D Fourier transform, 1, 12, 18,91 2D sinc, 107, 112

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Multiresolution schemes

Multiresolution schemes Multiresolution schemes Fondamenti di elaborazione del segnale multi-dimensionale Multi-dimensional signal processing Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Elaborazione

More information

Reference Text: The evolution of Applied harmonics analysis by Elena Prestini

Reference Text: The evolution of Applied harmonics analysis by Elena Prestini Notes for July 14. Filtering in Frequency domain. Reference Text: The evolution of Applied harmonics analysis by Elena Prestini It all started with: Jean Baptist Joseph Fourier (1768-1830) Mathematician,

More information

Multiresolution schemes

Multiresolution schemes Multiresolution schemes Fondamenti di elaborazione del segnale multi-dimensionale Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Elaborazione dei Segnali Multi-dimensionali e

More information

ECE 3084 OCTOBER 17, 2017

ECE 3084 OCTOBER 17, 2017 Objective ECE 3084 LAB NO. 1: MEASURING FREQUENCY RESPONSE OCTOBER 17, 2017 The objective of this lab is to measure the magnitude response of a set of headphones or earbuds. We will explore three alternative

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

V(t) = Total Power = Calculating the Power Spectral Density (PSD) in IDL. Thomas Ferree, Ph.D. August 23, 1999

V(t) = Total Power = Calculating the Power Spectral Density (PSD) in IDL. Thomas Ferree, Ph.D. August 23, 1999 Calculating the Power Spectral Density (PSD) in IDL Thomas Ferree, Ph.D. August 23, 1999 This note outlines the calculation of power spectra via the fast Fourier transform (FFT) algorithm. There are several

More information

Fourier Analysis of Signals

Fourier Analysis of Signals Chapter 2 Fourier Analysis of Signals As we have seen in the last chapter, music signals are generally complex sound mixtures that consist of a multitude of different sound components. Because of this

More information

A First Course in Wavelets with Fourier Analysis

A First Course in Wavelets with Fourier Analysis * A First Course in Wavelets with Fourier Analysis Albert Boggess Francis J. Narcowich Texas A& M University, Texas PRENTICE HALL, Upper Saddle River, NJ 07458 Contents Preface Acknowledgments xi xix 0

More information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. Preface p. xvii Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. 6 Summary p. 10 Projects and Problems

More information

Jean Morlet and the Continuous Wavelet Transform

Jean Morlet and the Continuous Wavelet Transform Jean Brian Russell and Jiajun Han Hampson-Russell, A CGG GeoSoftware Company, Calgary, Alberta, brian.russell@cgg.com ABSTRACT Jean Morlet was a French geophysicist who used an intuitive approach, based

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

WAVELET TRANSFORMS IN TIME SERIES ANALYSIS

WAVELET TRANSFORMS IN TIME SERIES ANALYSIS WAVELET TRANSFORMS IN TIME SERIES ANALYSIS R.C. SINGH 1 Abstract The existing methods based on statistical techniques for long range forecasts of Indian summer monsoon rainfall have shown reasonably accurate

More information

1 Introduction to Wavelet Analysis

1 Introduction to Wavelet Analysis Jim Lambers ENERGY 281 Spring Quarter 2007-08 Lecture 9 Notes 1 Introduction to Wavelet Analysis Wavelets were developed in the 80 s and 90 s as an alternative to Fourier analysis of signals. Some of the

More information

Filter Banks II. Prof. Dr.-Ing. G. Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany

Filter Banks II. Prof. Dr.-Ing. G. Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany Filter Banks II Prof. Dr.-Ing. G. Schuller Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany Page Modulated Filter Banks Extending the DCT The DCT IV transform can be seen as modulated

More information

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research Justin B. Sorensen J. Willard Marriott Library University of Utah justin.sorensen@utah.edu Abstract As emerging technologies

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Lecture Hilbert-Huang Transform. An examination of Fourier Analysis. Existing non-stationary data handling method

Lecture Hilbert-Huang Transform. An examination of Fourier Analysis. Existing non-stationary data handling method Lecture 12-13 Hilbert-Huang Transform Background: An examination of Fourier Analysis Existing non-stationary data handling method Instantaneous frequency Intrinsic mode functions(imf) Empirical mode decomposition(emd)

More information

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets 6 Evolutionary Power Spectrum Estimation Using Harmonic Wavelets Jale Tezcan Graduate Student, Civil and Environmental Engineering Department, Rice University Research Supervisor: Pol. D. Spanos, L.B.

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

Course content (will be adapted to the background knowledge of the class):

Course content (will be adapted to the background knowledge of the class): Biomedical Signal Processing and Signal Modeling Lucas C Parra, parra@ccny.cuny.edu Departamento the Fisica, UBA Synopsis This course introduces two fundamental concepts of signal processing: linear systems

More information

CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals

CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University October 6, 2005 1 Sound Sound waves are longitudinal

More information

Digital Image Processing Lectures 15 & 16

Digital Image Processing Lectures 15 & 16 Lectures 15 & 16, Professor Department of Electrical and Computer Engineering Colorado State University CWT and Multi-Resolution Signal Analysis Wavelet transform offers multi-resolution by allowing for

More information

Homework: 4.50 & 4.51 of the attachment Tutorial Problems: 7.41, 7.44, 7.47, Signals & Systems Sampling P1

Homework: 4.50 & 4.51 of the attachment Tutorial Problems: 7.41, 7.44, 7.47, Signals & Systems Sampling P1 Homework: 4.50 & 4.51 of the attachment Tutorial Problems: 7.41, 7.44, 7.47, 7.49 Signals & Systems Sampling P1 Undersampling & Aliasing Undersampling: insufficient sampling frequency ω s < 2ω M Perfect

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

arxiv: v1 [math.ca] 6 Feb 2015

arxiv: v1 [math.ca] 6 Feb 2015 The Fourier-Like and Hartley-Like Wavelet Analysis Based on Hilbert Transforms L. R. Soares H. M. de Oliveira R. J. Cintra Abstract arxiv:150.0049v1 [math.ca] 6 Feb 015 In continuous-time wavelet analysis,

More information

The Discrete Fourier Transform

The Discrete Fourier Transform In [ ]: cd matlab pwd The Discrete Fourier Transform Scope and Background Reading This session introduces the z-transform which is used in the analysis of discrete time systems. As for the Fourier and

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

FOURIER ANALYSIS. (a) Fourier Series

FOURIER ANALYSIS. (a) Fourier Series (a) Fourier Series FOURIER ANAYSIS (b) Fourier Transforms Useful books: 1. Advanced Mathematics for Engineers and Scientists, Schaum s Outline Series, M. R. Spiegel - The course text. We follow their notation

More information

Fourier Transforms For additional information, see the classic book The Fourier Transform and its Applications by Ronald N. Bracewell (which is on the shelves of most radio astronomers) and the Wikipedia

More information

Digital Image Processing

Digital Image Processing Digital Image Processing, 2nd ed. Digital Image Processing Chapter 7 Wavelets and Multiresolution Processing Dr. Kai Shuang Department of Electronic Engineering China University of Petroleum shuangkai@cup.edu.cn

More information

Application of Wavelet Transform and Its Advantages Compared To Fourier Transform

Application of Wavelet Transform and Its Advantages Compared To Fourier Transform Application of Wavelet Transform and Its Advantages Compared To Fourier Transform Basim Nasih, Ph.D Assitant Professor, Wasit University, Iraq. Abstract: Wavelet analysis is an exciting new method for

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

1 The Continuous Wavelet Transform The continuous wavelet transform (CWT) Discretisation of the CWT... 2

1 The Continuous Wavelet Transform The continuous wavelet transform (CWT) Discretisation of the CWT... 2 Contents 1 The Continuous Wavelet Transform 1 1.1 The continuous wavelet transform (CWT)............. 1 1. Discretisation of the CWT...................... Stationary wavelet transform or redundant wavelet

More information

COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS

COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS MUSOKO VICTOR, PROCHÁZKA ALEŠ Institute of Chemical Technology, Department of Computing and Control Engineering Technická 905, 66 8 Prague 6, Cech

More information

Short-Time Fourier Transform and Chroma Features

Short-Time Fourier Transform and Chroma Features Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Short-Time Fourier Transform and Chroma Features International Audio Laboratories Erlangen Prof. Dr. Meinard Müller Friedrich-Alexander Universität

More information

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31 Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V

More information

! Introduction. ! Discrete Time Signals & Systems. ! Z-Transform. ! Inverse Z-Transform. ! Sampling of Continuous Time Signals

! Introduction. ! Discrete Time Signals & Systems. ! Z-Transform. ! Inverse Z-Transform. ! Sampling of Continuous Time Signals ESE 531: Digital Signal Processing Lec 25: April 24, 2018 Review Course Content! Introduction! Discrete Time Signals & Systems! Discrete Time Fourier Transform! Z-Transform! Inverse Z-Transform! Sampling

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Wavelets. Lecture 28

Wavelets. Lecture 28 Wavelets. Lecture 28 Just like the FFT, the wavelet transform is an operation that can be performed in a fast way. Operating on an input vector representing a sampled signal, it can be viewed, just like

More information

Digital Speech Processing Lecture 10. Short-Time Fourier Analysis Methods - Filter Bank Design

Digital Speech Processing Lecture 10. Short-Time Fourier Analysis Methods - Filter Bank Design Digital Speech Processing Lecture Short-Time Fourier Analysis Methods - Filter Bank Design Review of STFT j j ˆ m ˆ. X e x[ mw ] [ nˆ m] e nˆ function of nˆ looks like a time sequence function of ˆ looks

More information

L29: Fourier analysis

L29: Fourier analysis L29: Fourier analysis Introduction The discrete Fourier Transform (DFT) The DFT matrix The Fast Fourier Transform (FFT) The Short-time Fourier Transform (STFT) Fourier Descriptors CSCE 666 Pattern Analysis

More information

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur Module MULTI- RESOLUTION ANALYSIS Version ECE IIT, Kharagpur Lesson Multi-resolution Analysis: Theory of Subband Coding Version ECE IIT, Kharagpur Instructional Objectives At the end of this lesson, the

More information

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Signal Modeling, Statistical Inference and Data Mining in Astrophysics ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view

More information

Comparison of spectral decomposition methods

Comparison of spectral decomposition methods Comparison of spectral decomposition methods John P. Castagna, University of Houston, and Shengjie Sun, Fusion Geophysical discuss a number of different methods for spectral decomposition before suggesting

More information

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur Module 4 Multi-Resolution Analysis Lesson Multi-resolution Analysis: Discrete avelet Transforms Instructional Objectives At the end of this lesson, the students should be able to:. Define Discrete avelet

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Multirate signal processing

Multirate signal processing Multirate signal processing Discrete-time systems with different sampling rates at various parts of the system are called multirate systems. The need for such systems arises in many applications, including

More information

Discrete Wavelet Transform: A Technique for Image Compression & Decompression

Discrete Wavelet Transform: A Technique for Image Compression & Decompression Discrete Wavelet Transform: A Technique for Image Compression & Decompression Sumit Kumar Singh M.Tech Scholar, Deptt. of Computer Science & Engineering Al-Falah School of Engineering & Technology, Faridabad,

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Filtering in the Frequency Domain http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Background

More information

Chapter 17: Fourier Series

Chapter 17: Fourier Series Section A Introduction to Fourier Series By the end of this section you will be able to recognise periodic functions sketch periodic functions determine the period of the given function Why are Fourier

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,

More information