Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Size: px
Start display at page:

Download "Speech Coding. Speech Processing. Tom Bäckström. October Aalto University"

Transcription

1 Speech Coding Speech Processing Tom Bäckström Aalto University October 2015

2 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications. There are over 7.6 billion active mobile subscriptions and more than 3.7 billion unique subscribers ( More mobile phones than people! More than half the population owns a mobile phone. Speech coding is the biggest speech processing application. Demonstrates that speech coding is important and that it works with sufficient quality. Demonstrates that also further improvements will have a majestic impact.

3 Introduction Generations in mobile networks G First widely deployed analog cellular system, AMPS G First digital systems GSM and CDMA G EDGE & HSPA -systems with increased data speeds G LTE systems with even faster data More mobile phones than people. 201? 5G Internet of things?

4 Introduction Some speech coding standards Name Year Bit-rate Bandwidth NMT 1981 Analog 3.5 khz GSM kbits/s 3.5 khz EFR kbits/s 3.5 khz AMR kbits/s kbits/s 3.5 khz AMR-WB kbits/s kbits/s 7 khz AMR-WB kbits/s kbits/s 7 khz EVS kbits/s kbits/s 3.5 khz khz AMR has been the most succesfull codec of all time and is still widely used. Deployment of AMR-WB is progressing (started 2006, but still not finished). EVS is the first codec with native support for IP-networks (LTE) and deployment could start in 2016.

5 Quality comparison as a function of bitrate Clean speech NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

6 Quality comparison as a function of bitrate Noisy speech NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

7 Quality comparison as a function of bitrate Mixed content (music and speech) NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

8 Speech Production Modelling Impulse train F 0 (z) + Linear predictor A(z) Residual X(z) In the speech production modelling -lecture, we already presented the source-filter model (above), where a linear predictor A(z) models the acoustic effect of the vocal tract, a long-time predictor models F0 (z) the periodic structure of the excitation (fundamental frequency) and the residual (excitation) E(z) is modeled with a noise codebook. The approach is known as Code-Excited Linear Prediction (CELP).

9 Speech Production Modelling To be accurate, the long-time predictor is applied as a filter, whereby it is not an additive but a cascaded element. 1 Residual codebook Long-time predictor Linear predictor Speech The entire model can then be written as X (z) = A 1 (z)f 1 0 (z)e(z) or equivalently x n = h n f n e n where h n is the impulse response of A 1 (z) = H(z). Next, each of these components is presented in detail. 1 The source-filter model is a good model and it is used in some applications. However, although it is often advertised that also speech coding is based on this approach that is not entirely accurate.

10 Linear prediction Linear prediction is a model of the spectral envelope of a speech signal. It models the overall shape of the spectrum. It can be a tube-model of the vocal tract, but estimating parameters of a tube-model is difficult. Besides, by modelling everything in the spectral envelope, we do not need a separate model for other effects such as the glottal excitation. Estimation of linear predictors was already discussed in a previous lecture. The residual A(z)X (z) = F 1 0 (z)e(z) of linear prediction has only a harmonic structure and/or noise.

11 Linear prediction Quantization and Coding Our task is to quantize the parameters of the linear predictive filter A(z) and encode them. In that process, the aim is to minimize the perceptually weighted error between the original filter A 1 (z) and the quantized filter  1 (z) ( min W (z) A 1 (z)  (z)) 1 2. Here W (z) is the perceptual weighting filter. Such an objective ensures that the perceptual effect of quantization is minimized, whereby the receiver can reconstruct the speech signal such that it maximally resembles the original signal.

12 Linear prediction Quantization and Coding A linear predictor is defined as A(z) = 1 + m α k z k k=1 whereby its parameters are the m scalars α k. These parameters α k entirely describe the linear predictor, whereby our objective is to quantize these parameters. Unfortunately, α k are sensitive to errors: Small errors in αk can have a big effect on the output (problem is highly non-linear). Stability of predictor cannot be guaranteed (the magnitude of predicted values can grow without bound). Direct quantization of α k is not feasible.

13 Linear prediction Quantization and Coding The best available method for quantization of linear predictors is based on a transform knwon as line spectral pair (LSP) decomposition. The predictor is split into two parts, one symmetric and the other antisymmetric: {P(z) = A(z) + z m 1 A ( z 1) Q(z) = A(z) z m 1 A ( z 1) where z m 1 A ( z 1) is merely the backwards polynomial where the coefficients α k are in reverse order. It follows that the original predictor can be reconstructed as A(z) = 1 [P(z) + Q(z)]. 2 The line spectral polynomials P(z) and Q(z) thus contain all the information of A(z).

14 Linear prediction Quantization and Coding The line spectral polynomials P(z) and Q(z) have the following properties If A(z) is stable, then the roots of P(z) are Q(z) are alternating (interlaced) on the unit circle. If the roots of P(z) are Q(z) are alternating (interlaced) on the unit circle then the reconstruction A(z) is stable. Small errors in the location of the roots of P(z) and Q(z) cause small errors in the output. It follows that the angles of the roots, the line spectral frequencies entirely describe the predictor and are robust to quantization errors. Perfect domain for quantization and coding.

15 Quantization and Coding LSF Z-plane 1 A(z) P(z) Q(z) 0.5 Imaginary part Real part

16 Quantization and Coding LSF Spectrum Magntiude (db) A(z) P(z) Q(z) Frequency (khz)

17 Quantization and Coding Background Given a predictor of order M, we obtain M line spectral frequencies which exactly describe the predictor and thus the spectral envelope. We then need to quantizer and code the frequencies such that the frequencies (the envelope) can be transmitted with as few bits and as high accuracy as possible. In general, spectra can have very complicated structure, whereby straightforward methods such as direct quantization and entropy coding is suboptimal. Vector quantization and coding gives (with some loose assumptions) always optimal performance, with the cost of higher complexity.

18 Quantization and Coding Idea Vector coding is based on the idea of a codebook-representation the signal. If x is an input vector, and vectors c k with k S is the codebook, then we can find the best match whereby c k x. k = arg min k d(x, c k ) (1) We then need to transmit only the codebook index k.

19 Quantization and Coding Complexity An inherent problem of vector coding is complexity. If the codebook has N elements then we need to calculate the distance between the input signal x to N codebook vectors. If we have 30 bits for the codebook, then N = With M = 16 we then need 2 34 operations to determine k. Unfeasible! In practice, we can use a layered structure such that we start by quantizing x roughly with a small codebook (e.g. N = 2 9 = 512), and then proceed to quantize the estimation error by further codebooks, each with small codebooks (Multi-Stage VQ). With three stages, each with N = 512 we can then reduce complexity to operations, which is feasible. A multi-stage VQ is sub-optimal, but the reduction in accuracy is reasonably small.

20 Quantization and Coding Training The question then remains, how to choose a codebook c k? If the input data would be easy to model, then we could design such a model and obtain same performance with lower complexity. Envelopes have complicated structure! We must therefore train the codebook from data. We would like to find the solution to {c k } = arg min {c k } E n[min k d(x n, c k )] (2) where E n [ ] is the expectation over all input vectors x n and d( ) is a distance measure. This is a complicated minimization problem!

21 Quantization and Coding Training/EM Direct iterative estimation: Algorithm 1 k-means or Expectation Maximisation Define an initial-guess codebook {c k } as, for example, K randomly chosen vectors from the database of x k. repeat Find the best-matching codebook vectors for each x k. Update each of the codebook vectors c k to be the mean (or centroids) of all vectors x k assigned to that codebook vector. until converged Demo!

22 Quantization and Coding Training/Split-VQ Algorithm with better convergence: Algorithm 2 Split Vector Quantization Define an initial-guess codebook {c k } as, for example, two randomly chosen vectors from the database of x k. Apply the k-means algorithm on this codebook. repeat Split each codevector c k into two vectors c k ɛ and c k + ɛ. Apply the k-means algorithm on this codebook. until codebook full Demo!

23 Quantization and Coding Training/Split-VQ Many improved training algorithms exist. Vector quantization was a very active field of research up to the 90 s. Classic book: Gersho and Gray Vector quantization and signal compression (1992). VQ remains the optimal approach in terms of efficiency. Decorrelation is a newer alternative approch which attempts to extract orthogonal directions, such that they can be modelled and quantized independently. Almost optimal, but low complexity. Not yet in wide-spread use.

24 Linear prediction Summary Linear prediction can be used to model the spectral envelope of speech signals. Line Spectral Frequencies is the most common/effective representation of linear predictoris for quantization. Vector quantization and coding is used on the line spectral frequencies.

25 Long-time Prediction Assumptions and Objectives Voiced sounds have a quasi-periodic structure, caused by the oscillations of the vocal folds. Fundamental frequency is assumed to be slowly changing. However, the rate of change is much faster than in most other types of audio. Usually less than 10 octaves / second, but sometimes up to 15 octaves / second can be observed ( one semitone during sub-frame of 5ms). The pitch range is usually something like 85 to 400 Hz. Perceptual pitch resolution is roughly 2 Hz. The objective is to model a feature of speech production, the pitch, to enable coding with high efficiency. Experience has shown that long time prediction is a very efficient tool for source modelling (has huge impact on SNR).

26 Long-time Prediction Vocabulary The fundamental frequency model is known by many names long time prediction (LTP) adaptive codebook (ACBK) fundamental frequency (F0) model impulse train.

27 Long-time Prediction Definition A long time predictor (LTP) can be defined by the pitch lag T and gain factor γ P as F 0 (z) = 1 γ P z T. In time domain we can predict a future sample of x n as ˆx n = γ P x n T. With a vector x k = [ x kn x kn+1... ] T x kn+n 1 we thus obtain ˆx k = γ p x k T.

28 Long-time Prediction Codebook A long time predictor can thus be interpreted as a vector codebook, where the codebook entries are x k T with different delays T. Since the past signal keeps changing from frame to frame, the codebook is signal adapative. This explains why long time predictors are often known as the adaptive codebook.

29 Long-time Prediction Optimization We want to optimize the output quality of the signal x k = He k, where x k is a vector of the output signal and H is the convolution matrix corresponding to linear prediction. Perceptual weighting can be applied by multiplying with a weighting matrix W such that the our optimization task is min B(e k ê k ) 2 where ê k is the quantized residual and B = WH. We want to model the residual with long time prediction, whereby ê k = γ P e k T and min B(e k γ P e k T ) 2. which gives the optimal T and γ P.

30 Long-time Prediction Optimization (advanced topic) The objective function has a multiplicative term γ P e k T whereby it has no simple analytic solution. We must use a trick to find the optimal parameters. The optimal value of γ P can be found by setting the derivative to zero 0 = γ P B(e k γ P e k T ) 2 = e T k T BT B(e k γ P e k T ) whereby γ P = et k T BT Be k Be k T 2.

31 Long-time Prediction Optimization (advanced topic) Given the optimal γ P, we can modify objective function by inserting the value of γ P. Through simple manipulations we find that arg min B(e k γ P e k T ) 2 = ( e T arg max k T B T ) 2 Be k Be k T 2. Optimizing this function gives us the optimal e k T with the assumption that γ P has the optimal value. Note that the above expression is actually the normalized correlation between Be k and Be k T. That is, we are trying to find that vector e k T which is closest to the direction of e k.

32 Long-time Prediction Summary Long time prediction is used to model the fundamental frequency. It can be implemented as a vector codebook, although it actually is a filter. The optimal e k T is found by maximizing the correlation between e k and e k T. The optimal γ P can then be quantized directly. The long time predictor thus requires transmission of these two parameters, the lag T and quantized gain γ P.

33 Residual coding Once the spectral envelope has been modelled with the linear predictor A(z) and the fundamental frequency with the long time predictor F 0 (z), we are left with a residual E(z) = X (z)a(z)f 0 (z). The residual contains everything that the two predictors were unable to model. It is basically white noise. Noise has no structure, right? So how do we model noise? How can we model the structure of something that has none? What do we know about noise?

34 Residual coding Let ɛ n be an uncorrelated, zero-mean white-noise signal, E[ɛ n ] = 0 and E[ɛ n ɛ k ] = 0 for n k. Nothing to model here. The energy (=variance) of the signal is σ 2 = E[ɛ 2 n]. We can encode the energy of the noise! If ek is a vector of the residual, we can encode the energy γc 2 = e k 2. The model is thus e k = γ C ẽ k, where ẽ k 2 = 1. The remaining problem then reduces to encoding a vector ẽ k which has ẽ k 2 = 1.

35 Residual coding Distribution (advanced topic) We happen to know that speech signals are distributed more or less according to the Laplace distribution. This holds also for the residual vector e k. The probability distribution of the Laplace distributed vector is defined as [ p(e k ) = C exp e ] k 1. b Since the length of the vector e k is encoded separately, it suffices to encode only vectors of a fixed length e k 1 = p. The probability of such vectors is [ p(e k e k 1 = p) = C exp p ] = constant. b All vectors with the same length have the same probability. We can use a codebook which contains all vectors of the same length.

36 Residual coding Algebraic coding Consider a vector with e k = 1 which is quantized to integer values. The possible vectors are Index ɛ 0 ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ Here, clearly, we can encode the vectors with index = 2k + s, where k is the position of the pulse and s = 0 for plus sign and s = 1 for minus sign..

37 Residual coding Algebraic coding With e k = 2 we can have either two separate pulses or two pulses at the same point, for example vectors ɛ 0 ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ Here it is a bit more challenging to generate indices for the above vectors..

38 Residual coding Algebraic coding In the case e k 1 = 2 we can use an index = 2Nk 1 + 2k 2 + s 1, where k n are the positions of the pulses, s 1 is the sign of the first pulse and N is the length of the vector. We can then deduce s2 from the positions such that if k 1 k 2 then s 2 = s 1, otherwise the opposite sign. Example with N = 2 Index k 1 k 2 s 1 s 2 ɛ 0 ɛ

39 Residual coding Algebraic coding With N = 2 and e k 1 = 2 we thus have 8 different codebook vectors which can be encoded with log 2 8 = 3 bits. If we would encode the same vector directly, by giving one bit for each position k 1 and k 2 and one bit for each sign s 1 and s 3, we would need 4 bits. The vector [+1, +1] could then be either k 1 = 0, k 2 = 1, s 0 = s 1 = 0 or k 1 = 1, k 2 = 0, s 0 = s 1 = 0. We have two descriptions for one vector which is inefficient. With the algebraic coding rule we get the smallest possible bit consumption.

40 Residual coding Algebraic coding The same approach can be extended to arbitrary N and p. Algebraic coding can then be used to describe noise vectors e k of any length p with minimum number of bits. Since the method is based on an algebraic rule (=an algorithm), the noise codebook does not need to be stored No storage needed. Since the codevectors are sparse (=if N is large and p is low, then e k is mostly zeros), whereby computations with e k are simple to perform (low complexity).

41 Residual coding Analysis by synthesis To find the optimal quantization e k, we use the same optimization as for the LTP arg max where ê k is the quantized residual. (êt k B T Be k ) 2 Bê k 2 Note that Hê k corresponds to the synthesised output signal and Bê k = WHê k is thus the perceptually weighted, synthesised output signal. The above optimization thus evaluates (analyses) the quality of the synthesiszed output signal. Consequently, this method is known as the analysis by synthesis method. To find the optimum, we have to evaluate every possible quantization of e k this is a brute-force method.

42 Residual coding Analysis by synthesis The final output is obtained as the sum of the contributions of the fundamental frequency model and the residual codebook x k = H (γ P ê k T + γ c ê k ). or equivalently as a time-domain convolution x n = h n (γ p ê n T + γ C ê n ). Here h n corresponds to the impulse response of the linear predictor, which is excited (filtered) by the codebook vectors ê k T and ê k. Hence the name, code excited linear prediction (CELP) and when using the algebraic residual codebook, algebraic code excited linear prediction (ACELP).

43 Residual coding Summary The residual after modelling with linear prediction and long time prediction is modelled with a noise codebook. We first encode the gain (energy) of the noise vector. Secondly, we encode the fixed-length residual with an algebraic codebook. The algebraic codebook generates vectors with an algorithm such that there is no storage required. The best quantization is found by a brute-force search, also known as the analysis by synthesis method.

44 Conclusion Speech coding is digital compression of speech for transmission and storage. It is the most important speech processing application with over 7 billion users. Most phones still use the old AMR standard, partly superseeded with the newer AMR-WB, but the newest standard, EVS, will hopefully soon replace both. Code excited linear prediction (CELP) is the standard approach and it is used in all main-stream standards. It is based on source-filter modelling where the spectral envelope is modelled with linear prediction, fundamental frequency with a long time predictor and the residual with a noise codebook. Parameters are optimized with a brute-force, analysis by synthesis method. Current research in speech coding is an active, math-intensive field of research.

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

3GPP TS V6.1.1 ( )

3GPP TS V6.1.1 ( ) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB)

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

HARMONIC VECTOR QUANTIZATION

HARMONIC VECTOR QUANTIZATION HARMONIC VECTOR QUANTIZATION Volodya Grancharov, Sigurdur Sverrisson, Erik Norvell, Tomas Toftgård, Jonas Svedberg, and Harald Pobloth SMN, Ericsson Research, Ericsson AB 64 8, Stockholm, Sweden ABSTRACT

More information

Design of a CELP coder and analysis of various quantization techniques

Design of a CELP coder and analysis of various quantization techniques EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005

More information

Advanced 3 G and 4 G Wireless Communication Prof. Aditya K Jagannathan Department of Electrical Engineering Indian Institute of Technology, Kanpur

Advanced 3 G and 4 G Wireless Communication Prof. Aditya K Jagannathan Department of Electrical Engineering Indian Institute of Technology, Kanpur Advanced 3 G and 4 G Wireless Communication Prof. Aditya K Jagannathan Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture - 19 Multi-User CDMA Uplink and Asynchronous CDMA

More information

Pulse-Code Modulation (PCM) :

Pulse-Code Modulation (PCM) : PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number of bits used to represent each sample. The rate from

More information

ETSI EN V7.1.1 ( )

ETSI EN V7.1.1 ( ) European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Adaptive Multi-Rate (AMR) speech transcoding GLOBAL SYSTEM FOR MOBILE COMMUNICATIONS R Reference DEN/SMG-110690Q7

More information

ETSI TS V5.0.0 ( )

ETSI TS V5.0.0 ( ) Technical Specification Universal Mobile Telecommunications System (UMTS); AMR speech Codec; Transcoding Functions () 1 Reference RTS/TSGS-046090v500 Keywords UMTS 650 Route des Lucioles F-0691 Sophia

More information

ETSI TS V ( )

ETSI TS V ( ) TS 146 060 V14.0.0 (2017-04) TECHNICAL SPECIFICATION Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full Rate (EFR) speech transcoding (3GPP TS 46.060 version 14.0.0 Release 14)

More information

The Equivalence of ADPCM and CELP Coding

The Equivalence of ADPCM and CELP Coding The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share

More information

ETSI TS V7.0.0 ( )

ETSI TS V7.0.0 ( ) TS 6 9 V7.. (7-6) Technical Specification Digital cellular telecommunications system (Phase +); Universal Mobile Telecommunications System (UMTS); Speech codec speech processing functions; Adaptive Multi-Rate

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

BASIC COMPRESSION TECHNIQUES

BASIC COMPRESSION TECHNIQUES BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION Hauke Krüger and Peter Vary Institute of Communication Systems and Data Processing RWTH Aachen University, Templergraben

More information

ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION

ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION Tobias Jähnel *, Tom Bäckström * and Benjamin Schubert * International Audio Laboratories Erlangen, Friedrich-Alexander-University

More information

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) INTERNATIONAL TELECOMMUNICATION UNION ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU G.722.2 (07/2003) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS Digital terminal equipments

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Source modeling (block processing)

Source modeling (block processing) Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured

More information

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Digital Signal Processing 17 (2007) 114 137 www.elsevier.com/locate/dsp A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Stephen So a,, Kuldip K.

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Vector Quantization and Subband Coding

Vector Quantization and Subband Coding Vector Quantization and Subband Coding 18-796 ultimedia Communications: Coding, Systems, and Networking Prof. Tsuhan Chen tsuhan@ece.cmu.edu Vector Quantization 1 Vector Quantization (VQ) Each image block

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009 The Secrets of Quantization Nimrod Peleg Update: Sept. 2009 What is Quantization Representation of a large set of elements with a much smaller set is called quantization. The number of elements in the

More information

L used in various speech coding applications for representing

L used in various speech coding applications for representing IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 1, NO. 1. JANUARY 1993 3 Efficient Vector Quantization of LPC Parameters at 24 BitsFrame Kuldip K. Paliwal, Member, IEEE, and Bishnu S. Atal, Fellow,

More information

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014 Scalar and Vector Quantization National Chiao Tung University Chun-Jen Tsai 11/06/014 Basic Concept of Quantization Quantization is the process of representing a large, possibly infinite, set of values

More information

EE368B Image and Video Compression

EE368B Image and Video Compression EE368B Image and Video Compression Homework Set #2 due Friday, October 20, 2000, 9 a.m. Introduction The Lloyd-Max quantizer is a scalar quantizer which can be seen as a special case of a vector quantizer

More information

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES Saikat Chatterjee and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science,

More information

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

RADIO SYSTEMS ETIN15. Lecture no: Equalization. Ove Edfors, Department of Electrical and Information Technology

RADIO SYSTEMS ETIN15. Lecture no: Equalization. Ove Edfors, Department of Electrical and Information Technology RADIO SYSTEMS ETIN15 Lecture no: 8 Equalization Ove Edfors, Department of Electrical and Information Technology Ove.Edfors@eit.lth.se Contents Inter-symbol interference Linear equalizers Decision-feedback

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

EE5585 Data Compression April 18, Lecture 23

EE5585 Data Compression April 18, Lecture 23 EE5585 Data Compression April 18, 013 Lecture 3 Instructor: Arya Mazumdar Scribe: Trevor Webster Differential Encoding Suppose we have a signal that is slowly varying For instance, if we were looking at

More information

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems GPP C.S00-0 Version.0 Date: June, 00 Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option for Spread Spectrum Systems COPYRIGHT GPP and its Organizational Partners claim

More information

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007 Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is

More information

Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters

Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters Author So, Stephen, Paliwal, Kuldip Published 2006 Journal Title IEEE Signal Processing Letters DOI

More information

Efficient Block Quantisation for Image and Speech Coding

Efficient Block Quantisation for Image and Speech Coding Efficient Block Quantisation for Image and Speech Coding Stephen So, BEng (Hons) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith University, Brisbane, Australia

More information

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service Options 62 and 63 for Spread Spectrum Systems

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service Options 62 and 63 for Spread Spectrum Systems GPP C.S00-A v.0 GPP C.S00-A Version.0 Date: April, 00 Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service Options and for Spread Spectrum Systems COPYRIGHT GPP and its Organizational

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ Digital Speech Processing Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models Adaptive and Differential Coding 1 Speech Waveform Coding-Summary of Part 1 1. Probability

More information

VID3: Sampling and Quantization

VID3: Sampling and Quantization Video Transmission VID3: Sampling and Quantization By Prof. Gregory D. Durgin copyright 2009 all rights reserved Claude E. Shannon (1916-2001) Mathematician and Electrical Engineer Worked for Bell Labs

More information

LPC and Vector Quantization

LPC and Vector Quantization LPC and Vector Quantization JanČernocký,ValentinaHubeikaFITBUTBrno When modeling speech production based on LPC, we assume that the excitation is passed through the linear filter: H(z) = A(z) G,where A(z)isaP-thorderpolynome:

More information

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK R. R. Khandelwal 1, P. K. Purohit 2 and S. K. Shriwastava 3 1 Shri Ramdeobaba College Of Engineering and Management, Nagpur richareema@rediffmail.com

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 12, NO 11, NOVEMBER 2002 957 Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression Markus Flierl, Student

More information

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;

More information

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur Lecture - 15 Analog to Digital Conversion Welcome to the

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yaroslavsky. Fundamentals of Digital Image Processing. Course 0555.330 Lec. 6. Principles of image coding The term image coding or image compression refers to processing image digital data aimed at

More information

COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING. Johannes Fischer * and Tom Bäckström * Fraunhofer IIS, Am Wolfsmantel 33, Erlangen, Germany

COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING. Johannes Fischer * and Tom Bäckström * Fraunhofer IIS, Am Wolfsmantel 33, Erlangen, Germany COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING Johannes Fischer * and Tom Bäcström * * International Audio Laboratories Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg (FAU) Fraunhofer IIS,

More information

Quantization of LSF Parameters Using A Trellis Modeling

Quantization of LSF Parameters Using A Trellis Modeling 1 Quantization of LSF Parameters Using A Trellis Modeling Farshad Lahouti, Amir K. Khandani Coding and Signal Transmission Lab. Dept. of E&CE, University of Waterloo, Waterloo, ON, N2L 3G1, Canada (farshad,

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU Audio Coding P.1 Fundamentals Quantization Waveform Coding Subband Coding 1. Fundamentals P.2 Introduction Data Redundancy Coding Redundancy Spatial/Temporal Redundancy Perceptual Redundancy Compression

More information

Multirate signal processing

Multirate signal processing Multirate signal processing Discrete-time systems with different sampling rates at various parts of the system are called multirate systems. The need for such systems arises in many applications, including

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

E303: Communication Systems

E303: Communication Systems E303: Communication Systems Professor A. Manikas Chair of Communications and Array Processing Imperial College London Principles of PCM Prof. A. Manikas (Imperial College) E303: Principles of PCM v.17

More information

Chapter 10 Applications in Communications

Chapter 10 Applications in Communications Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM

More information

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization Multimedia Systems Giorgio Leonardi A.A.2014-2015 Lecture 4 -> 6 : Quantization Overview Course page (D.I.R.): https://disit.dir.unipmn.it/course/view.php?id=639 Consulting: Office hours by appointment:

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model

Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model PAPERS Journal of the Audio Engineering Society Vol. 64, No. 11, November 216 ( C 216) DOI: https://doi.org/1.17743/jaes.216.28 Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression Institut Mines-Telecom Vector Quantization Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced Compression 2/66 19.01.18 Institut Mines-Telecom Vector Quantization Outline Gain-shape VQ 3/66 19.01.18

More information

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding entropy Article Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding Jerry D. Gibson * and Preethi Mahadevan Department of Electrical and Computer

More information

Revision of Lecture 4

Revision of Lecture 4 Revision of Lecture 4 We have discussed all basic components of MODEM Pulse shaping Tx/Rx filter pair Modulator/demodulator Bits map symbols Discussions assume ideal channel, and for dispersive channel

More information

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University On Compression Encrypted Data part 2 Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University 1 Brief Summary of Information-theoretic Prescription At a functional

More information

Finite Word Length Effects and Quantisation Noise. Professors A G Constantinides & L R Arnaut

Finite Word Length Effects and Quantisation Noise. Professors A G Constantinides & L R Arnaut Finite Word Length Effects and Quantisation Noise 1 Finite Word Length Effects Finite register lengths and A/D converters cause errors at different levels: (i) input: Input quantisation (ii) system: Coefficient

More information

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal Claus Bauer, Mark Vinton Abstract This paper proposes a new procedure of lowcomplexity to

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Class of waveform coders can be represented in this manner

Class of waveform coders can be represented in this manner Digital Speech Processing Lecture 15 Speech Coding Methods Based on Speech Waveform Representations ti and Speech Models Uniform and Non- Uniform Coding Methods 1 Analog-to-Digital Conversion (Sampling

More information

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING 5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).

More information

SUBOPTIMALITY OF THE KARHUNEN-LOÈVE TRANSFORM FOR FIXED-RATE TRANSFORM CODING. Kenneth Zeger

SUBOPTIMALITY OF THE KARHUNEN-LOÈVE TRANSFORM FOR FIXED-RATE TRANSFORM CODING. Kenneth Zeger SUBOPTIMALITY OF THE KARHUNEN-LOÈVE TRANSFORM FOR FIXED-RATE TRANSFORM CODING Kenneth Zeger University of California, San Diego, Department of ECE La Jolla, CA 92093-0407 USA ABSTRACT An open problem in

More information

4.2 Acoustics of Speech Production

4.2 Acoustics of Speech Production 4.2 Acoustics of Speech Production Acoustic phonetics is a field that studies the acoustic properties of speech and how these are related to the human speech production system. The topic is vast, exceeding

More information

Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec

Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec Journal of Computing and Information Technology - CIT 19, 2011, 2, 113 126 doi:10.2498/cit.1001767 113 Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

Basic Multi-rate Operations: Decimation and Interpolation

Basic Multi-rate Operations: Decimation and Interpolation 1 Basic Multirate Operations 2 Interconnection of Building Blocks 1.1 Decimation and Interpolation 1.2 Digital Filter Banks Basic Multi-rate Operations: Decimation and Interpolation Building blocks for

More information

Original application to speech was for data compression. Factor of about 10 But data compression is of less interest for speech these days

Original application to speech was for data compression. Factor of about 10 But data compression is of less interest for speech these days Original application to speech was for data compression Factor of about 10 But data compression is of less interest for speech these days Lives on because the compression method works by approximating

More information

AdaptiveFilters. GJRE-F Classification : FOR Code:

AdaptiveFilters. GJRE-F Classification : FOR Code: Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5 Lecture : Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP959 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Acknowledgement

More information

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture - 12 Doppler Spectrum and Jakes Model Welcome to

More information

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics. COMP 40420 2. Signals and Systems Dr Chris Bleakley UCD School of Computer Science and Informatics. Scoil na Ríomheolaíochta agus an Faisnéisíochta UCD. Introduction 1. Signals 2. Systems 3. System response

More information

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l Vector Quantization Encoder Decoder Original Image Form image Vectors X Minimize distortion k k Table X^ k Channel d(x, X^ Look-up i ) X may be a block of l m image or X=( r, g, b ), or a block of DCT

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

ETSI EN V7.0.1 ( )

ETSI EN V7.0.1 ( ) EN 3 969 V7.. (-) European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Half rate speech; Half rate speech transcoding (GSM 6. version 7.. Release 998) GLOBAL

More information

(51) Int Cl. 7 : G10L 19/12

(51) Int Cl. 7 : G10L 19/12 (19) Europäisches Patentamt European Patent Office Office européen des brevets *EP000994B1* (11) EP 0 991 04 B1 (12) EUROPEAN PATENT SPECIFICATION (4) Date of publication and mention of the grant of the

More information

Research Article Using Geometrical Properties for Fast Indexation of Gaussian Vector Quantizers

Research Article Using Geometrical Properties for Fast Indexation of Gaussian Vector Quantizers Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 7, Article ID 6319, 11 pages doi:1.11/7/6319 Research Article Using Geometrical Properties for Fast Indexation of

More information

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3.0 INTRODUCTION The purpose of this chapter is to introduce estimators shortly. More elaborated courses on System Identification, which are given

More information

Entropy-constrained quantization of exponentially damped sinusoids parameters

Entropy-constrained quantization of exponentially damped sinusoids parameters Entropy-constrained quantization of exponentially damped sinusoids parameters Olivier Derrien, Roland Badeau, Gaël Richard To cite this version: Olivier Derrien, Roland Badeau, Gaël Richard. Entropy-constrained

More information

Error Spectrum Shaping and Vector Quantization. Jon Dattorro Christine Law

Error Spectrum Shaping and Vector Quantization. Jon Dattorro Christine Law Error Spectrum Shaping and Vector Quantization Jon Dattorro Christine Law in partial fulfillment of the requirements for EE392c Stanford University Autumn 1997 0. Introduction We view truncation noise

More information

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Russell M. Mersereau Center for Signal and Image Processing Georgia Institute of Technology Outline Cache vector quantization Lossless

More information

A Study of Source Controlled Channel Decoding for GSM AMR Vocoder

A Study of Source Controlled Channel Decoding for GSM AMR Vocoder A Study of Source Controlled Channel Decoding for GSM AMR Vocoder K.S.M. Phanindra Girish A Redekar David Koilpillai Department of Electrical Engineering, IIT Madras, Chennai-6000036. phanindra@midascomm.com,

More information

Relationship Between λ and Q in RDO

Relationship Between λ and Q in RDO Jmspeex Journal of Dubious Theoretical Results July 6, 015 Abstract This is a log of theoretical calculations and approximations that are used in some of the Daala code. Some approximations are likely

More information