Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Similar documents
Time-domain representations

SPEECH ANALYSIS AND SYNTHESIS

CS578- Speech Signal Processing

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

L7: Linear prediction of speech

3GPP TS V6.1.1 ( )

Lab 9a. Linear Predictive Coding for Speech Processing

HARMONIC VECTOR QUANTIZATION

Design of a CELP coder and analysis of various quantization techniques

Advanced 3 G and 4 G Wireless Communication Prof. Aditya K Jagannathan Department of Electrical Engineering Indian Institute of Technology, Kanpur

Pulse-Code Modulation (PCM) :

ETSI EN V7.1.1 ( )

ETSI TS V5.0.0 ( )

ETSI TS V ( )

The Equivalence of ADPCM and CELP Coding

ETSI TS V7.0.0 ( )

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

BASIC COMPRESSION TECHNIQUES

Proc. of NCC 2010, Chennai, India

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

Linear Prediction 1 / 41

Source modeling (block processing)

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

Signal representations: Cepstrum

Vector Quantization and Subband Coding

Voiced Speech. Unvoiced Speech

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

L used in various speech coding applications for representing

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

EE368B Image and Video Compression

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES

BASICS OF COMPRESSION THEORY

Speech Signal Representations

RADIO SYSTEMS ETIN15. Lecture no: Equalization. Ove Edfors, Department of Electrical and Information Technology

Frequency Domain Speech Analysis

EE5585 Data Compression April 18, Lecture 23

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters

Efficient Block Quantisation for Image and Speech Coding

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service Options 62 and 63 for Spread Spectrum Systems

Automatic Speech Recognition (CS753)

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

VID3: Sampling and Quantization

LPC and Vector Quantization

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK

Feature extraction 2

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING. Johannes Fischer * and Tom Bäckström * Fraunhofer IIS, Am Wolfsmantel 33, Erlangen, Germany

Quantization of LSF Parameters Using A Trellis Modeling

Timbral, Scale, Pitch modifications

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU

Multirate signal processing

Sound 2: frequency analysis

E303: Communication Systems

Chapter 10 Applications in Communications

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

Revision of Lecture 4

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

Finite Word Length Effects and Quantisation Noise. Professors A G Constantinides & L R Arnaut

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

Improved Method for Epoch Extraction in High Pass Filtered Speech

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

representation of speech

Class of waveform coders can be represented in this manner

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

SUBOPTIMALITY OF THE KARHUNEN-LOÈVE TRANSFORM FOR FIXED-RATE TRANSFORM CODING. Kenneth Zeger

4.2 Acoustics of Speech Production

Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Basic Multi-rate Operations: Decimation and Interpolation

Original application to speech was for data compression. Factor of about 10 But data compression is of less interest for speech these days

AdaptiveFilters. GJRE-F Classification : FOR Code:

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Multimedia Networking ECE 599

ETSI EN V7.0.1 ( )

(51) Int Cl. 7 : G10L 19/12

Research Article Using Geometrical Properties for Fast Indexation of Gaussian Vector Quantizers

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Entropy-constrained quantization of exponentially damped sinusoids parameters

Error Spectrum Shaping and Vector Quantization. Jon Dattorro Christine Law

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources

A Study of Source Controlled Channel Decoding for GSM AMR Vocoder

Relationship Between λ and Q in RDO

Transcription:

Speech Coding Speech Processing Tom Bäckström Aalto University October 2015

Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications. There are over 7.6 billion active mobile subscriptions and more than 3.7 billion unique subscribers (https://gsmaintelligence.com/). More mobile phones than people! More than half the population owns a mobile phone. Speech coding is the biggest speech processing application. Demonstrates that speech coding is important and that it works with sufficient quality. Demonstrates that also further improvements will have a majestic impact.

Introduction Generations in mobile networks 1978 1G First widely deployed analog cellular system, AMPS. 1991 2G First digital systems GSM and CDMA. 2001 3G EDGE & HSPA -systems with increased data speeds. 2009 4G LTE systems with even faster data. 2014 More mobile phones than people. 201? 5G Internet of things?

Introduction Some speech coding standards Name Year Bit-rate Bandwidth NMT 1981 Analog 3.5 khz GSM 1987 13 kbits/s 3.5 khz EFR 1996 12.2 kbits/s 3.5 khz AMR 1999 4.75 kbits/s... 12.2 kbits/s 3.5 khz AMR-WB 2001 6.6 kbits/s... 23.85 kbits/s 7 khz AMR-WB+ 2001 5.2 kbits/s... 48 kbits/s 7 khz EVS 2014 5.9 kbits/s... 128 kbits/s 3.5 khz... 48 khz AMR has been the most succesfull codec of all time and is still widely used. Deployment of AMR-WB is progressing (started 2006, but still not finished). EVS is the first codec with native support for IP-networks (LTE) and deployment could start in 2016.

Quality comparison as a function of bitrate Clean speech NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

Quality comparison as a function of bitrate Noisy speech NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

Quality comparison as a function of bitrate Mixed content (music and speech) NB 3.5 khz, WB 7 khz, SWB 16 khz, FB 22 khz

Speech Production Modelling Impulse train F 0 (z) + Linear predictor A(z) Residual X(z) In the speech production modelling -lecture, we already presented the source-filter model (above), where a linear predictor A(z) models the acoustic effect of the vocal tract, a long-time predictor models F0 (z) the periodic structure of the excitation (fundamental frequency) and the residual (excitation) E(z) is modeled with a noise codebook. The approach is known as Code-Excited Linear Prediction (CELP).

Speech Production Modelling To be accurate, the long-time predictor is applied as a filter, whereby it is not an additive but a cascaded element. 1 Residual codebook Long-time predictor Linear predictor Speech The entire model can then be written as X (z) = A 1 (z)f 1 0 (z)e(z) or equivalently x n = h n f n e n where h n is the impulse response of A 1 (z) = H(z). Next, each of these components is presented in detail. 1 The source-filter model is a good model and it is used in some applications. However, although it is often advertised that also speech coding is based on this approach that is not entirely accurate.

Linear prediction Linear prediction is a model of the spectral envelope of a speech signal. It models the overall shape of the spectrum. It can be a tube-model of the vocal tract, but estimating parameters of a tube-model is difficult. Besides, by modelling everything in the spectral envelope, we do not need a separate model for other effects such as the glottal excitation. Estimation of linear predictors was already discussed in a previous lecture. The residual A(z)X (z) = F 1 0 (z)e(z) of linear prediction has only a harmonic structure and/or noise.

Linear prediction Quantization and Coding Our task is to quantize the parameters of the linear predictive filter A(z) and encode them. In that process, the aim is to minimize the perceptually weighted error between the original filter A 1 (z) and the quantized filter  1 (z) ( min W (z) A 1 (z)  (z)) 1 2. Here W (z) is the perceptual weighting filter. Such an objective ensures that the perceptual effect of quantization is minimized, whereby the receiver can reconstruct the speech signal such that it maximally resembles the original signal.

Linear prediction Quantization and Coding A linear predictor is defined as A(z) = 1 + m α k z k k=1 whereby its parameters are the m scalars α k. These parameters α k entirely describe the linear predictor, whereby our objective is to quantize these parameters. Unfortunately, α k are sensitive to errors: Small errors in αk can have a big effect on the output (problem is highly non-linear). Stability of predictor cannot be guaranteed (the magnitude of predicted values can grow without bound). Direct quantization of α k is not feasible.

Linear prediction Quantization and Coding The best available method for quantization of linear predictors is based on a transform knwon as line spectral pair (LSP) decomposition. The predictor is split into two parts, one symmetric and the other antisymmetric: {P(z) = A(z) + z m 1 A ( z 1) Q(z) = A(z) z m 1 A ( z 1) where z m 1 A ( z 1) is merely the backwards polynomial where the coefficients α k are in reverse order. It follows that the original predictor can be reconstructed as A(z) = 1 [P(z) + Q(z)]. 2 The line spectral polynomials P(z) and Q(z) thus contain all the information of A(z).

Linear prediction Quantization and Coding The line spectral polynomials P(z) and Q(z) have the following properties If A(z) is stable, then the roots of P(z) are Q(z) are alternating (interlaced) on the unit circle. If the roots of P(z) are Q(z) are alternating (interlaced) on the unit circle then the reconstruction A(z) is stable. Small errors in the location of the roots of P(z) and Q(z) cause small errors in the output. It follows that the angles of the roots, the line spectral frequencies entirely describe the predictor and are robust to quantization errors. Perfect domain for quantization and coding.

Quantization and Coding LSF Z-plane 1 A(z) P(z) Q(z) 0.5 Imaginary part 0 0.5 1 1 0.5 0 0.5 1 Real part

Quantization and Coding LSF Spectrum Magntiude (db) 60 40 20 0 20 A(z) P(z) Q(z) 40 0 1 2 3 4 5 6 Frequency (khz)

Quantization and Coding Background Given a predictor of order M, we obtain M line spectral frequencies which exactly describe the predictor and thus the spectral envelope. We then need to quantizer and code the frequencies such that the frequencies (the envelope) can be transmitted with as few bits and as high accuracy as possible. In general, spectra can have very complicated structure, whereby straightforward methods such as direct quantization and entropy coding is suboptimal. Vector quantization and coding gives (with some loose assumptions) always optimal performance, with the cost of higher complexity.

Quantization and Coding Idea Vector coding is based on the idea of a codebook-representation the signal. If x is an input vector, and vectors c k with k S is the codebook, then we can find the best match whereby c k x. k = arg min k d(x, c k ) (1) We then need to transmit only the codebook index k.

Quantization and Coding Complexity An inherent problem of vector coding is complexity. If the codebook has N elements then we need to calculate the distance between the input signal x to N codebook vectors. If we have 30 bits for the codebook, then N = 2 30. With M = 16 we then need 2 34 operations to determine k. Unfeasible! In practice, we can use a layered structure such that we start by quantizing x roughly with a small codebook (e.g. N = 2 9 = 512), and then proceed to quantize the estimation error by further codebooks, each with small codebooks (Multi-Stage VQ). With three stages, each with N = 512 we can then reduce complexity to 3 2 13 operations, which is feasible. A multi-stage VQ is sub-optimal, but the reduction in accuracy is reasonably small.

Quantization and Coding Training The question then remains, how to choose a codebook c k? If the input data would be easy to model, then we could design such a model and obtain same performance with lower complexity. Envelopes have complicated structure! We must therefore train the codebook from data. We would like to find the solution to {c k } = arg min {c k } E n[min k d(x n, c k )] (2) where E n [ ] is the expectation over all input vectors x n and d( ) is a distance measure. This is a complicated minimization problem!

Quantization and Coding Training/EM Direct iterative estimation: Algorithm 1 k-means or Expectation Maximisation Define an initial-guess codebook {c k } as, for example, K randomly chosen vectors from the database of x k. repeat Find the best-matching codebook vectors for each x k. Update each of the codebook vectors c k to be the mean (or centroids) of all vectors x k assigned to that codebook vector. until converged Demo!

Quantization and Coding Training/Split-VQ Algorithm with better convergence: Algorithm 2 Split Vector Quantization Define an initial-guess codebook {c k } as, for example, two randomly chosen vectors from the database of x k. Apply the k-means algorithm on this codebook. repeat Split each codevector c k into two vectors c k ɛ and c k + ɛ. Apply the k-means algorithm on this codebook. until codebook full Demo!

Quantization and Coding Training/Split-VQ Many improved training algorithms exist. Vector quantization was a very active field of research up to the 90 s. Classic book: Gersho and Gray Vector quantization and signal compression (1992). VQ remains the optimal approach in terms of efficiency. Decorrelation is a newer alternative approch which attempts to extract orthogonal directions, such that they can be modelled and quantized independently. Almost optimal, but low complexity. Not yet in wide-spread use.

Linear prediction Summary Linear prediction can be used to model the spectral envelope of speech signals. Line Spectral Frequencies is the most common/effective representation of linear predictoris for quantization. Vector quantization and coding is used on the line spectral frequencies.

Long-time Prediction Assumptions and Objectives Voiced sounds have a quasi-periodic structure, caused by the oscillations of the vocal folds. Fundamental frequency is assumed to be slowly changing. However, the rate of change is much faster than in most other types of audio. Usually less than 10 octaves / second, but sometimes up to 15 octaves / second can be observed ( one semitone during sub-frame of 5ms). The pitch range is usually something like 85 to 400 Hz. Perceptual pitch resolution is roughly 2 Hz. The objective is to model a feature of speech production, the pitch, to enable coding with high efficiency. Experience has shown that long time prediction is a very efficient tool for source modelling (has huge impact on SNR).

Long-time Prediction Vocabulary The fundamental frequency model is known by many names long time prediction (LTP) adaptive codebook (ACBK) fundamental frequency (F0) model impulse train.

Long-time Prediction Definition A long time predictor (LTP) can be defined by the pitch lag T and gain factor γ P as F 0 (z) = 1 γ P z T. In time domain we can predict a future sample of x n as ˆx n = γ P x n T. With a vector x k = [ x kn x kn+1... ] T x kn+n 1 we thus obtain ˆx k = γ p x k T.

Long-time Prediction Codebook A long time predictor can thus be interpreted as a vector codebook, where the codebook entries are x k T with different delays T. Since the past signal keeps changing from frame to frame, the codebook is signal adapative. This explains why long time predictors are often known as the adaptive codebook.

Long-time Prediction Optimization We want to optimize the output quality of the signal x k = He k, where x k is a vector of the output signal and H is the convolution matrix corresponding to linear prediction. Perceptual weighting can be applied by multiplying with a weighting matrix W such that the our optimization task is min B(e k ê k ) 2 where ê k is the quantized residual and B = WH. We want to model the residual with long time prediction, whereby ê k = γ P e k T and min B(e k γ P e k T ) 2. which gives the optimal T and γ P.

Long-time Prediction Optimization (advanced topic) The objective function has a multiplicative term γ P e k T whereby it has no simple analytic solution. We must use a trick to find the optimal parameters. The optimal value of γ P can be found by setting the derivative to zero 0 = γ P B(e k γ P e k T ) 2 = e T k T BT B(e k γ P e k T ) whereby γ P = et k T BT Be k Be k T 2.

Long-time Prediction Optimization (advanced topic) Given the optimal γ P, we can modify objective function by inserting the value of γ P. Through simple manipulations we find that arg min B(e k γ P e k T ) 2 = ( e T arg max k T B T ) 2 Be k Be k T 2. Optimizing this function gives us the optimal e k T with the assumption that γ P has the optimal value. Note that the above expression is actually the normalized correlation between Be k and Be k T. That is, we are trying to find that vector e k T which is closest to the direction of e k.

Long-time Prediction Summary Long time prediction is used to model the fundamental frequency. It can be implemented as a vector codebook, although it actually is a filter. The optimal e k T is found by maximizing the correlation between e k and e k T. The optimal γ P can then be quantized directly. The long time predictor thus requires transmission of these two parameters, the lag T and quantized gain γ P.

Residual coding Once the spectral envelope has been modelled with the linear predictor A(z) and the fundamental frequency with the long time predictor F 0 (z), we are left with a residual E(z) = X (z)a(z)f 0 (z). The residual contains everything that the two predictors were unable to model. It is basically white noise. Noise has no structure, right? So how do we model noise? How can we model the structure of something that has none? What do we know about noise?

Residual coding Let ɛ n be an uncorrelated, zero-mean white-noise signal, E[ɛ n ] = 0 and E[ɛ n ɛ k ] = 0 for n k. Nothing to model here. The energy (=variance) of the signal is σ 2 = E[ɛ 2 n]. We can encode the energy of the noise! If ek is a vector of the residual, we can encode the energy γc 2 = e k 2. The model is thus e k = γ C ẽ k, where ẽ k 2 = 1. The remaining problem then reduces to encoding a vector ẽ k which has ẽ k 2 = 1.

Residual coding Distribution (advanced topic) We happen to know that speech signals are distributed more or less according to the Laplace distribution. This holds also for the residual vector e k. The probability distribution of the Laplace distributed vector is defined as [ p(e k ) = C exp e ] k 1. b Since the length of the vector e k is encoded separately, it suffices to encode only vectors of a fixed length e k 1 = p. The probability of such vectors is [ p(e k e k 1 = p) = C exp p ] = constant. b All vectors with the same length have the same probability. We can use a codebook which contains all vectors of the same length.

Residual coding Algebraic coding Consider a vector with e k = 1 which is quantized to integer values. The possible vectors are Index ɛ 0 ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ 5... 0 +1 0 0 0 0 0... 1 1 0 0 0 0 0... 2 0 +1 0 0 0 0... 3 0 1 0 0 0 0... 4 0 0 +1 0 0 0... 5 0 0 1 0 0 0..... Here, clearly, we can encode the vectors with index = 2k + s, where k is the position of the pulse and s = 0 for plus sign and s = 1 for minus sign..

Residual coding Algebraic coding With e k = 2 we can have either two separate pulses or two pulses at the same point, for example vectors ɛ 0 ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ 5... +1 0 0 0 1 0... 1 0 0 1 0 0... 0 +2 0 0 0 0... 0 2 0 0 0 0... 0 0 0 +1 1 0..... Here it is a bit more challenging to generate indices for the above vectors..

Residual coding Algebraic coding In the case e k 1 = 2 we can use an index = 2Nk 1 + 2k 2 + s 1, where k n are the positions of the pulses, s 1 is the sign of the first pulse and N is the length of the vector. We can then deduce s2 from the positions such that if k 1 k 2 then s 2 = s 1, otherwise the opposite sign. Example with N = 2 Index k 1 k 2 s 1 s 2 ɛ 0 ɛ 1 0 0 0 0 0 +2 0 1 0 0 1 1 2 0 2 0 1 0 0 +1 +1 3 0 1 1 1 1 1 4 1 0 0 1 1 +1 5 1 0 1 0 +1 1 6 1 1 0 0 0 +2 7 1 1 1 1 0 2

Residual coding Algebraic coding With N = 2 and e k 1 = 2 we thus have 8 different codebook vectors which can be encoded with log 2 8 = 3 bits. If we would encode the same vector directly, by giving one bit for each position k 1 and k 2 and one bit for each sign s 1 and s 3, we would need 4 bits. The vector [+1, +1] could then be either k 1 = 0, k 2 = 1, s 0 = s 1 = 0 or k 1 = 1, k 2 = 0, s 0 = s 1 = 0. We have two descriptions for one vector which is inefficient. With the algebraic coding rule we get the smallest possible bit consumption.

Residual coding Algebraic coding The same approach can be extended to arbitrary N and p. Algebraic coding can then be used to describe noise vectors e k of any length p with minimum number of bits. Since the method is based on an algebraic rule (=an algorithm), the noise codebook does not need to be stored No storage needed. Since the codevectors are sparse (=if N is large and p is low, then e k is mostly zeros), whereby computations with e k are simple to perform (low complexity).

Residual coding Analysis by synthesis To find the optimal quantization e k, we use the same optimization as for the LTP arg max where ê k is the quantized residual. (êt k B T Be k ) 2 Bê k 2 Note that Hê k corresponds to the synthesised output signal and Bê k = WHê k is thus the perceptually weighted, synthesised output signal. The above optimization thus evaluates (analyses) the quality of the synthesiszed output signal. Consequently, this method is known as the analysis by synthesis method. To find the optimum, we have to evaluate every possible quantization of e k this is a brute-force method.

Residual coding Analysis by synthesis The final output is obtained as the sum of the contributions of the fundamental frequency model and the residual codebook x k = H (γ P ê k T + γ c ê k ). or equivalently as a time-domain convolution x n = h n (γ p ê n T + γ C ê n ). Here h n corresponds to the impulse response of the linear predictor, which is excited (filtered) by the codebook vectors ê k T and ê k. Hence the name, code excited linear prediction (CELP) and when using the algebraic residual codebook, algebraic code excited linear prediction (ACELP).

Residual coding Summary The residual after modelling with linear prediction and long time prediction is modelled with a noise codebook. We first encode the gain (energy) of the noise vector. Secondly, we encode the fixed-length residual with an algebraic codebook. The algebraic codebook generates vectors with an algorithm such that there is no storage required. The best quantization is found by a brute-force search, also known as the analysis by synthesis method.

Conclusion Speech coding is digital compression of speech for transmission and storage. It is the most important speech processing application with over 7 billion users. Most phones still use the old AMR standard, partly superseeded with the newer AMR-WB, but the newest standard, EVS, will hopefully soon replace both. Code excited linear prediction (CELP) is the standard approach and it is used in all main-stream standards. It is based on source-filter modelling where the spectral envelope is modelled with linear prediction, fundamental frequency with a long time predictor and the residual with a noise codebook. Parameters are optimized with a brute-force, analysis by synthesis method. Current research in speech coding is an active, math-intensive field of research.