High Rate Speech Service Option 17 for Wideband Spread Spectrum Communication Systems

Size: px

Start display at page:

Download "High Rate Speech Service Option 17 for Wideband Spread Spectrum Communication Systems"

Ross Walters
5 years ago
Views:

1 Document: C.S000-0 Version: Date: December High Rate Speech Service Option for Wideband Spread Spectrum Communication Systems COPYRIGHT GPP and its Organizational Partners claim copyright in this document and individual Organizational Partners may copyright and issue documents or standards publications in individual Organizational Partner's name based on this document. Requests for reproduction of this document should be directed to the GPP Secretariat at Requests to reproduce individual Organizational Partner's documents should be directed to that Organizational Partner. See for more information.

2 High Rate Speech Service Option for Wideband Spread Spectrum Communication Systems Publish Version November,

3 Copyright TIA.

4 PREFACE These technical requirements form a standard for Service Option, a variable rate, twoway speech service option. The maximum speech coding rate of the service option is. kbps. This standard does not address the quality or reliability of Service Option, nor does it cover equipment performance or measurement procedures. 0 SECTION SUMMARY. General. This section defines the terms and numeric indicators used in this document.. Service Option : Variable Data Rate Two-Way Voice. This section describes the requirements for Service Option. Included in these requirements is the description of a speech codec algorithm for variable rate, two-way voice.. Annex A. Bibliography. This is an informative annex (not considered part of this standard) listing documents which may be useful in implementing the standard. 0 i

5 0 0 NOTES. TIA/EIA/IS- Recommended Minimum Performance Standard for the High Rate Speech Service Option for Wideband Spread Spectrum Communication Systems, provides specifications and measurement methods.. Base station refers to the functions performed on the land side, which are typically distributed among a cell, a sector of a cell, and a mobile switching center.. Section uses the following verbal forms: Shall and shall not identify requirements to be followed strictly to conform to the standard and from which no deviation is permitted. Should and should not indicate that one of several possibilities is recommended as particularly suitable, without mentioning or excluding others; that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is discouraged but not prohibited. May and need not indicate a course of action permissible within the limits of the standard. Can and cannot are used for statements of possibility and capability, whether material, physical, or causal.. Footnotes appear at various points in this specification to elaborate and further clarify items discussed in the body of the specification.. Unless indicated otherwise, this document presents numbers in decimal form. Binary numbers are distinguished in the text by the use of single quotation marks. In some tables, binary values may appear without single quotation marks if table notation clearly specifies that values are binary. The character x is used to represent a binary bit of unspecified value. For example xxx0000 represents any -bit binary value such that the least significant five bits equal Hexadecimal numbers (base ) are distinguished in the text by use of the form 0xh¼h where h¼h represents a string of hexadecimal digits. For example, 0xfa represents a number whose binary value is and whose decimal value is. ii

6 0 NOTES. The following conventions apply to mathematical expressions in this standard: ëxû indicates the largest integer less than or equal to x: ë.û =, ë.0û =. éxù indicates the smallest integer greater than or equal to x: é.ù =, é.0ù =. x indicates the absolute value of x: - =, =. Å indicates exclusive OR. min(x, y) indicates the minimum of x and y. max(x, y) indicates the maximum of x and y. In figures, Ä indicates multiplication. In formulas within the text, multiplication is implicit. For example, if h(n) and p L (n) are functions, then h(n) p L (n) = h(n) Ä p L (n). x mod y indicates the remainder after dividing x by y: x mod y = x - (y ëx/yû). round(x) is traditional rounding: round(x) = ëx + 0.û. sign( x) = ì x ³ 0 í. î - x < 0 å indicates summation. If the summation symbol specifies initial and terminal values, and the initial value is greater than the terminal value, then the value of the summation is 0. For example, if N=0, and if f(n) represents an arbitrary function, then N å n= f( n) = 0. 0 The bracket operator, [ ], isolates individual bits of a binary value. VAR[n] refers to bit n of the binary representation of the value of the variable VAR, such that VAR[0] is the least significant bit of VAR. The value of VAR[n] is either 0 or. This standard uses the two-sided z-transform as given below. See Oppenheim, A. V. and Schafer, R. W., Digital Signal Processing, pp. -. ( ) = x i z -i Fz å i=- iii

7 REFERENCES The following standards contain provisions which, through reference in this text, constitute provisions of this Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. ANSI and TIA maintain registers of currently valid national standards published by them American National Standards:. ANSI/EIA/TIA-, Acoustic-to-Digital and Digital-to-Acoustic Transmission Requirements for ISDN Terminals, March. Other Standards:. CCITT Recommendation G., Pulse Code Modulation (PCM) of Voice Frequencies, Vol. III, Geneva.. CCITT Recommendation G., Separate Performance Characteristics for the Encoding and Decoding Sides of PCM Channels Applicable to -Wire Voice- Frequency Interfaces, Blue Book, Vol. III, Melbourne.. IEEE Standard -, IEEE Standard Methods for Measuring Transmission Performance of Analog and Digital Telephone Sets,.. IEEE Standard -, Method for Determining Objective Loudness Ratings of Telephone Connections,.. ANSI J-STD-00, Personal Station-Base Station Compatibility Requirements for. to.0 GHz Code Division Multiple Access (CDMA) Personal Communications Systems.. TIA/EIA/IS--A, Mobile Station-Base Station Compatibility Standard for Dual- Mode Wideband Spread Spectrum Cellular System. All references to TIA/EIA/IS- -A shall be inclusive of text adopted ty TSB.. TIA/EIA/IS-, Recommended Minimum Performance Standard for Digital Cellular Wideband Spread Spectrum Speech Service Option, May.. TIA/EIA/IS-, Recommended Minimum Performance Standard for the High Rate Speech Service Option for Wideband Spread Spectrum Communication Systems. 0. TSB, Telecommunications Systems Bulletin: Support for. kbps Data Rate and PCS Interaction for Wideband Spread Spectrum Cellular Systems, December. iv

8 CONTENTS GENERAL Terms and Numeric Information... - SERVICE OPTION : VARIABLE DATA RATE TWO-WAY VOICE General Description Service Option Number Multiplex Option Required Multiplex Option Support Interface to Multiplex Option Transmitted Packets Received Packets Service Negotiation Initialization and Connection Mobile Station Requirements Base Station Requirements Service Option Control Messages Mobile Station Requirements Base Station Requirements Variable Rate Speech Coding Algorithm Introduction Input Audio Interface Input Audio Interface in the Mobile Station Conversion and Scaling Digital Audio Input Analog Audio Input Adjusting the Transmit Level Band Pass Filtering Echo Return Loss Input Audio Interface in the Base Station Sampling and Format Conversion Adjusting the Transmit Level Echo Canceling Ear Protection... - v

9 CONTENTS Determining the Formant Prediction Parameters Form of the Formant Synthesis Filter Encoding High-Pass Filtering of Input Samples Windowing the Samples Computing the Autocorrelation Function Determining the LPC Coefficients from the Autocorrelation Function Transforming the LPC Coefficients to Line Spectrum Pairs (LSPs) Converting the LSP Frequencies to Transmission Codes for Rate, Rate /, and Rate / Computing the Sensitivities of the LSP Frequencies Vector Quantizing the LSP Frequencies LSP VQ Codebooks Converting the LSP Frequencies to Transmission Codes for Rate / Decoding LSP Frequencies and Converting to LPC Coefficients Converting the LSP Transmission Codes to LSP Frequencies Checking the Stability of the LSP Frequencies for Rate / Encoding Low-Pass Filtering the LSP Frequencies Interpolating the LSP Frequencies Converting the Interpolated LSP Frequencies to LPC Coefficients Scaling the LPC Coefficients to Perform Bandwidth Expansion Determining the Packet Type (Rate) First Stage of Rate Determination Algorithm Computing Band Energy Calculating Rate Determination Thresholds Comparing Thresholds Performing Hangover Constraining Rate Selection Updating Smoothed Band Energy Updating the Smoothed Band Energy...-0 vi

10 CONTENTS Updating Background Noise Estimate Updating Signal Energy Estimate Second Stage of Rate Determination Algorithm: Rate Reduction Unvoiced Detection Temporally Masked Frame Detection Stationary Voiced Frame Detection Adapting Thresholds to Achieve Target Average Rate Determining the Pitch Prediction Parameters Encoding Computing the Pitch Lag and Pitch Gain Implementing the Pitch Search Convolutions Converting the Pitch Gain and Pitch Lag to the Transmission Codes Decoding Determining the Excitation Codebook Parameters Encoding Computing the Codebook Index and Codebook Gain for Rate and Rate / Implementing the Codebook Search Convolutions Computing the Codebook Gain for Rate / and Rate / Frames Converting Codebook Parameters into Transmission Codes for Rate and Rate / Converting Codebook Parameters into Transmission Codes for Rate / Converting Codebook Parameters into Transmission Codes for Rate / Decoding Converting Codebook Transmission Codes for Rate and Rate / Converting Codebook Transmission Codes for Rate / Converting Codebook Transmission Codes for Rate / Data Packing Rate Packing Rate / Packing Rate / Packing... - vii

11 CONTENTS Rate / Packing Decoding at the Transmitting Speech Codec and the Receiving Speech Codec Generating the Scaled Codebook Vector Generating the Scaled Codebook Vector for Rate and Rate / Generating the Scaled Codebook Vector for Rate / Generating the Scaled Codebook Vector for Rate / Generating the Pitch Synthesis Filter Output Generating the Pitch Pre-Filter Synthesis Output Generating the Formant Synthesis Filter Output Updating the Memories of W(z) in the Transmitting Speech Codec The Adaptive Postfilter in the Receiving Speech Codec Special Cases Insufficient Frame Quality (Erasure) Packets Blank Packets Incorrect Packet Detection Initializing Speech Codec Output Audio Interface Output Audio Interface in the Mobile Station Band Pass Filtering Adjusting the Receive Level Output Audio Interface in the Base Station Adjusting the Receive Level Summary of Encoding and Decoding Encoding Summary Decoding Summary Allowable Delays Allowable Transmitting Speech Codec Encoding Delay Allowable Receiving Speech Codec Decoding Delay...-. Summary of Service Option Notation...- ANNEX A BIBLIOGRAPHY...- viii

12 FIGURES Speech Synthesis Structure in the Receiving Speech Codec Bit Allocation for a Rate Packet Bit Allocation for a Rate / Packet Bit Allocation for a Rate / Packet Bit Allocation for a Rate / Packet Converting the LSP Frequencies to Transmission Codes for Rate / Converting the LSP Transmission Codes to LSP Frequencies for Rate / and Insufficient Frame Quality Frames Two Stages in the Rate Determination Algorithm Decimation of the Prediction Residual for NACF Computation Flowchart for the Second Stage of the Rate Determination Algorithm Histogram of Target_SNR Feature with Reference to Target_SNR_Threshold Analysis-by-Synthesis Procedure for the Pitch Parameter Search Analysis-by-Synthesis Procedure for Codebook Parameter Search Converting Codebook Parameters for Rate and Rate / Converting Codebook Parameters for Rate / Converting Codebook Parameters for Rate / Converting Codebook Transmission Codes for Rate and Rate / Converting Codebook Transmission Codes for Rate / Converting Codebook Transmission Codes for Rate / Decoding at the Transmitting Speech Codec Decoding at the Receiving Speech Codec... - ix

13 TABLES Packet Types Supplied by Service Option to the Multiplex Sublayer Packet Types Supplied by the Multiplex Sublayer to Service Option Valid Service Configuration Attributes for Service Option Service Option Control Message Type-Specific Fields Fraction of Packets at Rate, Rate /, and Rate / with Rate Reduction Parameters Used for Each Rate Transmission Codes and Bit Allocations (Part of ) Transmission Codes and Bit Allocations (Part of ) Hamming Window Values WH(n) LSP Vector Quantization for LSPVQ LSP Vector Quantization for LSPVQ (Part of ) LSP Vector Quantization for LSPVQ (Part of ) LSP Vector Quantization for LSPVQ (Part of ) LSP Vector Quantization for LSPVQ (Part of ) LSP Vector Quantization for LSPVQ LSP Vector Quantization for LSPVQ LSP Subframe Interpolation for All Rates Valid Rate Modifications for the Rate Reduction Algorithm FIR Filter Coefficients Used for Band Energy Calculations Threshold Scale Factors as a Function of SNR Hangover Frames as a Function of SNR Impulse Response of LPF Used in the Decimation Process to Calculate the NACF Unvoiced Encoding Rate as a Function of Reduced Rate Level Definition of Terms for Pitch Search Circular Codebook for Rate / Frames Circular Codebook for Rate Frames Definition of Terms for Codebook Search Codebook Quantizer (Rate, Rate /, and Rate /) Codebook Quantizer (Rate Every th Subframe)...- x

14 TABLES...- Conversion for CBGAIN (Rate, Rate /, and Rate /) Conversion for CBGAIN (Rate Every th Subframe) Conversion for CBSIGN for Rate and Rate / Rate / Frame Bits Used as the Seed for Pseudorandom Number Generation Codebook Quantizer (Rate /) Conversion for CBGAIN (Rate /) Table for Conversion from CBSIGN to G Ù S Table for Conversion from CBGAIN to G Ù Table for Conversion from G Ù to G Ù a Rate Packet Structure (Part of ) Rate Packet Structure (Part of ) Rate Packet Structure (Part of ) Rate / Packet Structure Rate / Packet Structure Rate / Packet Structure Impulse Response of BPF Used to Filter the White Excitation for Rate / Synthesis Gain Subtraction Value as a Function of Consecutive Erasures Pitch Saturation Levels as a Function of Consecutive Erasures LSP Predictor Decay as a Function of Consecutive Erasures Summary of Service Option Notation (Part of ) Summary of Service Option Notation (Part of ) Summary of Service Option Notation (Part of ) Summary of Service Option Notation (Part of ) Summary of Service Option Notation (Part of ) Summary of Service Option Notation (Part of ) xi

15 No text. xii

16 0 0 0 GENERAL. Terms and Numeric Information Autocorrelation Function. A function showing the relationship of a signal with a timeshifted version of itself. Base Station. A station in the Public Radio Telecommunications Service, other than a mobile station, used for radio communications with mobile stations. CELP. See Code Excited Linear Predictive Coding. Codec. The combination of an encoder and decoder in series (encoder/decoder). Code Excited Linear Predictive Coding (CELP). A speech coding algorithm. CELP coders use codebook excitation, a long-term pitch prediction filter, and a short-term formant prediction filter. Codebook. A set of vectors used by the speech codec. For each speech codec codebook subframe, one particular vector is chosen and used to excite the speech codec s filters. The codebook vector is chosen to minimize the weighted error between the original and synthesized speech after the pitch and formant synthesis filter coefficients have been determined. Coder. Same as encoder. Decoder. Generally, a device for the translation of a signal from a digital representation into an analog format. For this standard, a device which converts speech encoded in the format specified in this standard to analog or an equivalent PCM representation. DECSD. Decoder Seed. Encoder. Generally, a device for the translation of a signal into a digital representation. For this standard, a device which converts speech from an analog or its equivalent PCM representation to the digital representation described in this standard. Formant. A resonant frequency of the human vocal tract causing a peak in the short term spectrum of speech. IIR Filter. An infinite-duration impulse response filter is a filter for which the output, in response to an impulse input, never totally converges to zero. This term is usually used in reference to digital filters. Linear Predictive Coding (LPC). A method of predicting future samples of a sequence by a linear combination of the previous samples of the same sequence. Linear Predictive Coding is frequently used in reference to a class of speech codecs. Line Spectral Pair (LSP). A representation of digital filter coefficients in a pseudofrequency domain. This representation has good quantization and interpolation properties. LPC. See Linear Predictive Coding. LSB. Least significant bit. LSP. See Line Spectral Pair. -

17 0 0 0 MSB. Most significant bit. Mobile Station. A station in the Public Radio Telecommunications Service intended to be used while in motion or during halts at unspecified points. Normalized Autocorrelation Function (NACF). A measure used to determine the pitch period and the degree of periodicity of the input speech. This measure is useful in distinguishing voiced from unvoiced speech. Packet. The unit of information exchanged between service option applications in the base station and the mobile station. Pitch. The fundamental frequency in speech caused by the periodic vibration of the human vocal cords. RDA. Rate Determination Algorithm. Receive Objective Loudness Rating (ROLR). A measure of receive audio sensitivity. ROLR is a frequency-weighted ratio of the line voltage input signal to a reference encoder to the acoustic output of the receiver. IEEE defines the measurement of sensitivity and IEEE defines the calculation of objective loudness rating. SPL. Sound Pressure Level. Transmit Objective Loudness Rating (TOLR). A measure of transmit audio sensitivity. TOLR is a frequency-weighted ratio of the acoustic input signal at the transmitter to the line voltage output of the reference decoder. IEEE defines the measurement of sensitivity and IEEE defines the calculation of objective loudness rating. Voiced Speech. Speech generated when the vocal cords are vibrating at a fundamental frequency. Characterized by high energy, periodicity, and a large ratio of energy below khz to energy above khz. Unvoiced Speech. Speech generated by forcing air through constrictions in the vocal tract without vibration of the vocal cords. Characterized by a lack of periodicity, and a nearunity ratio of energy below khz to energy above khz. WAEPL. Weighted Acoustic Echo Path Loss. A measure of the echo performance under normal conversation. ANSI/EIA/TIA- defines the measurement of WAEPL. Zero Input Response (ZIR). The filter output caused by the non-zero initial state of the filter when no input is present. Zero State Response (ZSR). The filter output caused by an input when the initial state of the filter is zero. ZIR. See Zero Input Response. ZSR. See Zero State Response. -

18 0 0 SERVICE OPTION : VARIABLE DATA RATE TWO-WAY VOICE. General Description Service Option provides two-way voice communications between the base station and the mobile station using the dynamically variable data rate speech codec algorithm described in this standard. The service option takes voice samples and generates an encoded speech packet for every Traffic Channel frame. The receiving station generates a speech packet from every Traffic Channel frame and supplies it to the service option for decoding into voice samples. The two speech codecs communicate at one of four rates: Rate, Rate /, Rate /, and Rate /. In case of a discrepancy between the master C simulation and the algorithmic description, the master C simulation will prevail. The master C simulation is contained in the database of the performance specification for this algorithm, TIA/EIA/IS-.. Service Option Number The variable data rate two-way voice service option using the speech codec algorithm described by this standard shall use service option number and is called Service Option.. Multiplex Option.. Required Multiplex Option Support Service Option shall support an interface with Multiplex Option (see TIA/EIA/IS-). Speech packets for Service Option shall only be transported as primary traffic... Interface to Multiplex Option... Transmitted Packets The service option shall generate and supply exactly one packet to the multiplex sublayer every 0 ms. The packet contains the service option information bits which are transmitted as primary traffic. The service option shall operate in one of two modes: IS- Mobile Station-Base Station Compatibility Standard for Dual-Mode Wideband Spread Spectrum Cellular System and J-STD-00 Personal Station-Base Station Compatibility Requirements for. to.0 GHz Code Division Multiple Access (CDMA) Personal Communications Systems use the term frame to represent a 0 ms grouping of data on the Traffic Channel. Common speech codec terminology also uses the term frame to represent a quantum of processing. For Service Option 0x000, the speech codec frame corresponds to speech sampled over 0 ms. The speech samples are processed into a packet. This packet is transmitted in a Traffic Channel frame. -

19 In the first mode, the packet supplied by the service option shall be one of the types shown in Table...-. Upon command, the service option shall generate Blank packets. Also, upon command, the service option shall generate a non-blank packet with a maximum rate of Rate /. In the second mode, the packet supplied by the service option shall be one of the types shown in Table...-, excluding the Rate packet. Upon command, the service option shall generate a Blank packet. Also upon command, the service option shall generate a non-blank packet with a maximum rate of Rate /. 0 Table...-. Packet Types Supplied by Service Option to the Multiplex Sublayer Packet Type Bits per Packet Rate Rate / Rate / Rate / 0 Blank Received Packets The multiplex sublayer in the mobile station categorizes every received Traffic Channel frame and supplies the packet type and accompanying bits, if any, to the service option as shown in Table...-. The service option processes the bits of the packet as described in.. The first five received packet types shown in Table...- correspond to the transmitted packet types shown in Table...-. When the multiplex sublayer determines that a received frame is in error, the multiplex sublayer supplies an insufficient frame quality (erasure) packet to the service option. Table...-. Packet Types Supplied by the Multiplex Sublayer to Service Option Packet Type Bits per Packet Rate Rate / Rate / Rate / 0 Blank 0 Insufficient frame quality (erasure) 0 -

20 .. Service Negotiation The mobile station and base station shall perform service negotiation for the service option as described in IS- or J-STD-00, and the negotiated service configuration shall include only valid attributes for the service option as specified in Table..-. Table..-. Valid Service Configuration Attributes for Service Option Service Configuration Attribute Valid Selections Forward Multiplex Option Multiplex Option Reverse Multiplex Option Multiplex Option Forward Transmission Rates Reverse Transmission Rates Forward Traffic Type Reverse Traffic Type Rate Set with all four rates enabled Rate Set with all four rates enabled Primary Traffic Primary Traffic Initialization and Connection... Mobile Station Requirements If the mobile station accepts a service configuration, as specified in a Service Connect Message, that includes a service option connection using the service option, the mobile station shall perform the following: If the service option connection is new (that is, not part of the previous service configuration), the mobile station shall perform speech codec initialization (see..) at the action time associated with the Service Connect Message. The mobile station shall complete the initialization within 0 ms. Commencing at the action time associated with the Service Connect Message, and continuing for as long as the service configuration includes the service option connection, the service option shall process received packets and shall generate and supply packets for transmission as follows: - If the mobile station is in the Conversation Substate, the service option shall process the received packets and generate and supply packets for transmission in accordance with this standard. - If the mobile station is not in the Conversation Substate, the service option shall process the received packets in accordance with this standard, and shall generate and supply All Ones Rate / Packets for transmission, except when commanded to generate a blank packet. -

21 Base Station Requirements If the base station establishes a service configuration, as specified in a Service Connect Message, that includes a service option connection using the service option, the base station shall perform the following: If the service option connection is new (that is, not part of the previous service configuration), the base station shall perform speech codec initialization (see..) no later than the action time associated with the Service Connect Message. Commencing at the action time associated with the Service Connect Message and continuing for as long as the service configuration includes the service option connection, the service option shall process received packets and shall generate and supply packets for transmission in accordance with this standard. The base station may defer enabling the audio input and output... Service Option Control Messages... Mobile Station Requirements The mobile station shall support one pending Service Option Control Message for the service option. If the mobile station receives a Service Option Control Message for the service option, then, at the action time associated with the message, the mobile station shall process the message as follows:. If the MOBILE_TO_MOBILE field is equal to, the service option shall process each received Blank packet as an insufficient frame quality (erasure) packet. In addition, if the INIT_CODEC field is equal to, the service option should disable the audio output for second after initialization. If the MOBILE_TO_MOBILE field is equal to 0, the service option shall process each received packet as described in.... If the INIT_CODEC field is equal to, the mobile station shall perform speech codec initialization (see..). The mobile station shall complete the initialization within 0 ms.. If the RATE_REDUC field is equal to a value defined in Table...-, the service option shall generate the fraction of those packets normally generated as Rate packets (see...) at either Rate, Rate /, or Rate / as specified by the corresponding line in Table...-. The service option shall continue to use these fractions until either of the following events occur: The mobile station receives a Service Option Control Message specifying a different RATE_REDUC, or The service option is initialized. The service option may use the procedure defined in... to perform this rate reduction. This rate reduction mechanism is not deterministic, but depends upon the -

22 0 statistics of the input speech. The values in Table...- are based upon the assumption that 0% of active speech is unvoiced. In reduced rate level, unvoiced speech is encoded using Rate /. In reduced rate levels and, unvoiced speech is encoded using Rate /. In reduced rate level, 0% of the voiced speech frames are encoded using Rate /. The decision to encode the input voiced speech frame as Rate / or Rate is made based upon the statistics of the input speech and the average encoding rate for active speech as defined in... If the RATE_REDUC field is not equal to a value defined in Table...-, the mobile station shall reject the message by sending a Mobile Station Reject Order with the ORDQ field set equal to Base Station Requirements The base station may send a Service Option Control Message to the mobile station. If the base station sends a Service Option Control Message, the base station shall include the following type-specific fields for the service option: Table...-. Service Option Control Message Type-Specific Fields Field Length (bits) RATE_REDUC RESERVED MOBILE_TO_MOBILE INIT_CODEC 0 0 RATE_REDUC - Rate reduction. The base station shall set this field to the RATE_REDUC value from Table...- corresponding to the rate reduction that the mobile station is to perform. RESERVED - Reserved bits. The base station shall set this field to 000. MOBILE_TO_MOBILE - Mobile-to-mobile processing. If the mobile station is to perform mobile-to-mobile processing (see...), the base station shall set this field to. In addition, if the mobile station is to disable the audio output of the speech codec for second after initialization, the base station shall set the INIT_CODEC field and the MOBILE_TO_- MOBILE field to. If the mobile station is not to perform mobile-to-mobile processing, the base station shall set this field to 0. INIT_CODEC - Initialize speech codec. If the mobile station is to initialize the speech codec (see..), the base station shall set this field to ; otherwise, the base station shall set this field to 0. -

23 Table...-. Fraction of Packets at Rate, Rate /, and Rate / with Rate Reduction RATE_REDUC Reduced Rate Mode Level Average Encoding Rate for Active Speech (kbps) Fraction of Normally Rate Packets to be Rate Fraction of Normally Rate Packets to be Rate / Fraction of Normally Rate Packets to be Rate / All other RATE_REDUC values are reserved. Note: Average Encoding Rate calculation uses channel rates of.,., and. kbps for Rate, /, and / respectively. 0. Variable Rate Speech Coding Algorithm.. Introduction The speech codec uses a code excited linear predictive (CELP) coding algorithm. This technique uses a codebook to vector quantize the residual signal using an analysis-bysynthesis method. The speech codec produces a variable output data rate based upon speech activity. For typical two-way telephone conversations, the average data rate is reduced by a factor of two or more with respect to the maximum data rate. The overall speech synthesis or decoder model is shown in Figure..-. First, a vector is taken from one of two sources depending on the rate. For Rate / and Rate / a pseudorandom vector is generated. For all other rates, a vector specified by an index Ù I is taken from the codebook, which is a table of vectors. This vector is multiplied by a gain term G Ù, and then is filtered by the long-term pitch synthesis filter whose characteristics are governed by the pitch parameters L Ù and b Ù. The output of the pitch synthesis filter is processed by the pitch pre-filter. The pitch pre-filter parameters are the pitch lag, L Ù, and Ù Ù an attenuated pitch gain coefficient, b', derived from b. The output of the pre-filter is For a summary of Service Option 0x000 notation, see.. -

24 0 filtered by the formant synthesis filter to reproduce the speech signal. The output of the formant synthesis filter is filtered by the adaptive postfilter, PF(z). The speech codec encoding procedure involves determining the input parameters for the decoder which minimize the perceptual difference between the synthesized and the original speech. The selection processes for each set of parameters are described in this section. The encoding procedure also includes quantizing the parameters and packing them into data packets for transmission. The speech codec decoding procedure involves unpacking the data packets, unquantizing the received parameters, and reconstructing the speech signal from these parameters. The reconstruction consists of filtering the scaled codebook vector, c d (n), as shown in Figure..-. Gain Control p' d (n) Pseudorandom Vector Generator DECSD Codebook Rate / or / All Other Rates c d (n) Pitch Synthesis Filter P(z) p d (n) Pitch Pre- Filter P'(z) p pre (n) Formant Synthesis (LPC) Filter A(z) y d (n) Postfilter PF(z) pf(n) Gain Control ^ I G^ ^ ^ L & b ^ ^ L & b' ^a,..., a^ 0 s d (n) Input Parameters Figure..-. Speech Synthesis Structure in the Receiving Speech Codec Output Speech The input speech is sampled at khz. This speech is broken down into 0 ms speech codec frames, each consisting of 0 samples. The formant synthesis (LPC) filter coefficients are updated once per frame, regardless of the data rate selected. The number of bits used to encode the LPC parameters is a function of the selected data rate. Within each Also called the linear predictive coding filter, whose characteristics are governed by the filter coefficients a^,..., a^ 0. -

25 frame, the pitch and codebook parameters are updated a varying number of times, depending upon the selected data rate. Table..- describes the various parameters used for each rate. Table..-. Parameters Used for Each Rate Parameter Rate Rate / Rate / Rate / Linear predictive coding (LPC) updates per frame Samples per LPC update, L A 0 (0 ms) 0 (0 ms) 0 (0 ms) 0 (0 ms) Bits per LPC update 0 Pitch updates (subframes) per frame 0 0 Samples per pitch subframe, L p 0 ( ms) 0 ( ms) - Bits per pitch update - Codebook updates (subframes) per frame Samples per codebook subframe, L C 0 (. ms) 0 ( ms) ( ms) 0 (0 ms) Bits per codebook update. * * * *Note: Rate uses bits per codebook update in of the codebook subframes per frame and bits per codebook update, in four codebook subframes. Rate / uses five unsigned codebook gains, each -bits long for scaling the pseudorandom excitation. Rate / uses six bits for pseudorandom excitation, instead of using the codebook. 0 The components for each rate packet are shown in Figures..- through..-. In these figures, each LPC frame corresponds to one 0-sample frame of speech. The number in the LPC block of each figure is the number of bits used at that rate to encode the LPC coefficients. Each pitch block corresponds to a pitch update within each frame, and the number in each pitch block corresponds to the number of bits used to encode the updated pitch parameters. For example at Rate, the pitch parameters are updated four times, once for each quarter of the speech frame, each time using bits to encode the new pitch parameters. Similarly, each codebook block corresponds to a codebook update within each frame, and the number in each codebook block corresponds to the number of bits used to encode the updated codebook parameters. For example at Rate /, the codebook parameters are updated four times, once for each quarter of the speech frame, each time using bits to encode the parameters. -

26 LPC Frame Total = bits Pitch Subframe + Codebook Subframe reserved bits Figure..-. Bit Allocation for a Rate Packet LPC Frame Pitch Subframe Codebook Subframe Figure..-. Bit Allocation for a Rate / Packet Total = bits LPC Frame Total = bits Pitch Subframe 0 + Codebook Subframe Figure..-. Bit Allocation for a Rate / Packet reserved bits 0 LPC Frame 0 Total = bits Pitch Subframe 0 + Codebook Subframe Figure..-. Bit Allocation for a Rate / Packet reserved bits -

27 Table..- lists all the parameter codes transmitted for each rate packet. The following list describes each parameter: LSPi Line Spectral Pair frequency i. 0 LSPVi PLAGi PFRACi PGAINi CBINDEXi CBGAINi CBSEED CBSIGNi Line Spectral Pair frequencies grouped into five vectors of dimension two. Pitch Lag for the ith pitch subframe. Fractional Pitch Lag for the ith pitch subframe. Pitch Gain for the ith pitch subframe. Codebook Index for the ith codebook subframe. Unsigned Codebook Gain for the ith codebook subframe. Random Seed for Rate / packets. Sign of the Codebook Gain for the ith codebook subframe. This standard refers to the LSB of a particular code as CODE[0] and the more significant bits as CODE[], CODE[], etc. For example, if LSPV = 000 in binary for a maximum rate frame, LSPV[0] =, LSPV[] =, LSPV[] = 0, LSPV[] =, LSPV[] = 0, and LSPV[] = 0. -0

28 Table..-. Transmission Codes and Bit Allocations (Part of ) Rate Rate Code / / / Code / / / LSP CBINDEX LSP CBINDEX LSP CBINDEX LSP CBINDEX LSP CBINDEX LSP CBINDEX LSP CBINDEX LSP CBINDEX0 LSP CBINDEX LSP0 CBINDEX LSPV CBINDEX LSPV CBINDEX LSPV CBINDEX LSPV CBINDEX LSPV CBGAIN PLAG CBGAIN PLAG CBGAIN PLAG CBGAIN PLAG CBGAIN PFRAC CBGAIN PFRAC CBGAIN PFRAC CBGAIN PFRAC CBGAIN PGAIN CBGAIN0 PGAIN CBGAIN PGAIN CBGAIN PGAIN CBGAIN CBSEED CBGAIN CBINDEX CBGAIN CBINDEX CBGAIN -

29 Table..-. Transmission Codes and Bit Allocations (Part of ) Rate Rate Code / / / Code / / / CBSIGN CBSIGN CBSIGN CBSIGN0 CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN CBSIGN Input Audio Interface... Input Audio Interface in the Mobile Station The input audio may be either an analog or digital signal.... Conversion and Scaling The speech shall be sampled at a rate of 000 samples per second. The speech shall be quantized to a uniform PCM format with at least magnitude bits of dynamic range. The quantities in this standard assume a -bit integer input quantization with a range of ±0. The following speech codec discussion assumes this -bit integer quantization. If the speech codec uses a different quantization, then appropriate scaling should be used.... Digital Audio Input If the input audio is an -bit mlaw PCM signal, it shall be converted to a uniform PCM format according to Table in CCITT Recommendation G. Pulse Code Modulation (PCM) of Voice Frequencies.... Analog Audio Input If the input is in analog form, the mobile station shall sample the analog speech and shall convert the samples to a digital format for speech codec processing. This shall be done by either the following or an equivalent method: First, the input gain audio level is adjusted. Then, the signal is bandpass filtered to prevent aliasing. Finally, the filtered signal is sampled and quantized (see...).... Adjusting the Transmit Level The mobile station shall have a transmit objective loudness rating (TOLR) equal to - db, when transmitting to a reference base station (see..0..). The loudness ratings are described in IEEE Standard - IEEE Standard Method for Determining Objective Loudness Ratings of Telephone Connections. Measurement techniques and tolerances are -

30 0 0 0 described in IS- Recommended Minimum Performance Standard for Wideband Spread Spectrum Digital Cellular System Speech Service Options.... Band Pass Filtering Input anti-aliasing filtering shall conform to CCITT Recommendation G. Separate Performance Characteristics for the Encoding and Decoding Sides of PCM Channels Applicable to -Wire Voice-Frequency Interfaces. Additional anti-aliasing filtering may be provided by the manufacturer.... Echo Return Loss Provision shall be made to ensure adequate isolation between receive and transmit audio paths in all modes of operation. When no external transmit audio is present, the speech codec shall not generate packets at rates higher than Rate / (see..), due to acoustic coupling of the receive audio into the transmit audio path (specifically with the receive audio at full volume). Target levels of db WAEPL should be met. See ANSI/EIA/TIA Standard Acoustic-to-Digital and Digital-to-Acoustic Transmission Requirements for ISDN Terminals. Refer to the requirements stated in IS- Recommended Minimum Performance Standard for Wideband Spread Spectrum Digital Cellular System Speech Service Options.... Input Audio Interface in the Base Station... Sampling and Format Conversion The base station converts the input speech (analog, mlaw companded Pulse Code Modulation, or other format) into a uniform quantized PCM format with at least magnitude bits of dynamic range. The sampling rate is 000 samples per second. The sampling and conversion process shall be as in Adjusting the Transmit Level The base station shall set the transmit level so that a 00 Hz tone at a level of 0 dbm0 at the network interface produces a level. db below the level of a sine wave whose peak is at the maximum quantization level. Measurement techniques and tolerances are described in IS- Recommended Minimum Performance Standard for Wideband Spread Spectrum Digital Cellular System Speech Service Options.... Echo Canceling The base station shall provide a method to cancel echoes returned by the PSTN interface. The echo canceling function should provide at least 0 db of echo return loss enhancement. The echo canceling function should work over a range of PSTN echo return delays from 0 to ms. Because of the relatively long delays inherent in the speech coding and transmitting processes, echoes that are not sufficiently suppressed are noticeable to the mobile station user. -

31 0... Ear Protection To protect the user from possible ear damage, ear-piece acoustic output shall be limited so as not to exceed 0 db SPL when placed to the ear as measured in accordance with. of IEEE - Standard Method for Measuring Transmission Performance on Analog and Digital Telephone Sets... Determining the Formant Prediction Parameters... Form of the Formant Synthesis Filter The formant synthesis filter, which is similar to the traditional LPC formant synthesis filter, is the inverse of the formant prediction error filter. The prediction error filter is of the tenth order (i.e., P is equal to 0), and has transfer function ( ) =- a i z -i Az P å (...-) i= The formant synthesis filter has transfer function ( ) = P Az å - a i z -i i= (...-) The LPC coefficients, a i, are computed from the input speech.... Encoding 0 The encoding process begins by determining the formant prediction parameters. This is performed by the following steps:. High-pass filter the input samples.. Window the filtered samples using a Hamming window.. Compute the values of the autocorrelation function corresponding to shifts from 0 to samples.. Determine the LPC coefficients from the autocorrelation values.. Transform the LPC coefficients to LSP frequencies.. Convert the LSP frequencies into LSP codes (these codes are placed into the packet for transmission). -

32 ... High-Pass Filtering of Input Samples A high-pass digital filter is inserted into the input signal path to remove unwanted background and circuit noise and to prevent a DC offset from artificially increasing R(0) (see...) and thus disrupting the rate decision algorithm (see..). One possible highpass filter for accomplishing these objectives is defined as HPF( z) = 0. z - z + z -.z + 0. (...-) 0... Windowing the Samples The high-pass filtered speech samples are windowed using a Hamming window which is centered at the center of the fourth Rate pitch subframe. The window is 0 samples long (i.e., L A is equal to 0). Let s(n) be the input speech signal with the DC removed, where s(0) denotes the first sample of the current frame. The windowed speech signal is defined as S w ( n) = sn+0 ( )W H ( n), 0 n L A - (...-) where the Hamming window, W H (n), is defined in Table...- in hexadecimal format. Each value in the table has fractional bits. Note the offset of 0 samples, which results in the window of speech being centered between the th and 0th samples of the current speech frame of 0 samples, and s(0+i) for 0 i are the first 0 samples of the next speech frame. -

33 Table...-. Hamming Window Values W H (n) n W H (n) n n W H (n) n n W H (n) n 0 0x0f 0x 0x 0 0x0 0x0 0xf 0 0x0 0xd 0 0x 0 0x0 0 0xf 0xc 0 0x0d 0x 0x00 0 0x0b 0xaf 0xdb 00 0x0f 0xacd 0 0xaf 0x0d 0xbee 0xa 0x0 0xd 0xd 0x0f 0 0xe 0xf 0 0x0 0xfe 0xaa 0x0dc 0x0 0xbc 0x0e 0xb0 0 0xbe 0x0ec 0 0xda 0xcb 0x0 0x0 0xd0 0x0a 0xd 0xd0 0 0x0ad0 0x 0 0xdf 0x0b 0xb 0xeb 0x0c 0xa0 0xeb 0x0d0 0 0xc 0xf0 0 0x0dd 0xae 0xff 0x0eb0 0xbfd 0xf 0x0f0 0xd 0 0xfb 0x0 0 0xe 0 0xfdb 0x 0xf 0 0xff 0xb 0x0 0 0xfff 0 0x 0x 0 -

34 0... Computing the Autocorrelation Function Following the windowing operation, the kth value of the autocorrelation function is computed as ( ) = S w m Rk L A --k å ( )S w m + k, 0 k m=0 (...-) Only the first values of the autocorrelation function, R(0) through R(), need to be computed from the windowed speech signal within the analysis window. Of these, the first values of the autocorrelation function are required for LPC analysis. All values are used for the rate determination algorithm defined in Determining the LPC Coefficients from the Autocorrelation Function The LPC coefficients are obtained from the autocorrelation function. A method is Durbin s recursion, as shown below. 0 0 { E (0) = R(0) i = while (i ² P) { iê-ê ï ì ï ü ki = ír(i)ê-ê åêa (i-) ý j ÊR(iÊ-Êj)Ê /E(i - ) î ï þ ï jê=ê a (i) i = ki j = while (j ² i-) { a (i) j = a (i-) j - kia (i-) i-j j = j + } E (i) = ( - k Êi ) E(i - ) i = i + } } The LPC coefficients are ( P) a j = a j, j P (...-) See Rabiner, L. R. and Schafer, R. W., Digital Processing of Speech Signals, (New Jersey: Prentice- Hall Inc, ), pp. -. The superscripts in parentheses represent the stage of Durbin s recursion. For example a (i) j refers to a j at the ith stage. -

35 ... Transforming the LPC Coefficients to Line Spectrum Pairs (LSPs) The LPC coefficients are transformed into line spectrum pair frequencies. The prediction error filter transfer function, A(z), is given by Az ( ) =- a z a 0 z -0 (...-) where a i, i 0, are the LPC coefficients as described earlier. Define two new transfer functions P A (z) and Q A (z) as P A ( z) = Az ( )+ z - Az ( - ) =+ p z p z - + p z p z -0 + z - (...-) and Q A ( z) = Az ( )- z - Az ( - ) =+ q z q z - - q z q z -0 - z - (...-) 0 where p i =-a i -a -i, i (...-) and q i =-a i +a -i, i (...-) The LSP frequencies are the ten roots which exist between w=0 and w=.0 in the following two equations: ( ) = cos ( pw) P' w ( )+ p' cos( ( pw) )+...+p' cos( pw)+ p' (...-) ( ) = cos ( pw) Q' w ( )+ q' cos( ( pw) )+...+q' cos( pw)+ q' (...-) where the parameters p' and q' are computed recursively from the parameters p and q as p' 0 = q' 0 = (...-) 0 p' i = p i - p' i-, i (...-) q' i = q i + q' i-, i (...-0) -

36 0 Since the formant synthesis (LPC) filter is stable, the roots of the two functions alternate in the range from 0 to.0. If these ten roots are denoted as w, w,..., w 0 in the increasing order of magnitude, then w i for i=,,,, are roots of P'(w) and w i for i=,,,,0 are those of Q'(w).... Converting the LSP Frequencies to Transmission Codes for Rate, Rate /, and Rate / For Rate, Rate /, and Rate /, a vector quantizer (VQ) is used to quantize the 0 LSP frequencies into bits. The quantization procedure is described in the following subsections.... Computing the Sensitivities of the LSP Frequencies Before quantization begins, the following algorithm is used to compute how sensitive each LSP is to quantization. These sensitivity weightings are used in the quantization process to weight the quantization error in each LSP frequency appropriately: First, obtain the set of values J i, composed of J i () through J i (0), where i is the index of the LSP frequency of interest, by performing long division operations on P A (z) and Q A (z) given in Equations...- and...-. For the LSP frequencies with odd index, w, w, etc., the long division is performed as pz pz p z - + z = J J z J 0 z i()+ i( ) i( ) - cos( p - wi ) z + - z (...-) 0 and for the LSP frequencies with even index, w, w, etc., the long division is performed as qz qz q z - - z = J J z J 0 z i()+ i( ) i( ) - cos( p - wi ) z + - z (...-) Next, compute the autocorrelations of the vectors J i, using the following equation: R Ji 0-n ( n) = å J i ( k)j ( i k + n ), 0 n <0 and i 0 (...-) k= Finally, compute the sensitivity weights for the LSP frequencies by cross correlating the vectors with the autocorrelation vector computed from the speech (see R Ji Equation...-) and multiplying the results by sin ( pw i ). The final sensitivity weights, SW i are given by SW i = sin pw i æ ( ) çr0 ç è ö ( )R Ji ( 0)+. 0åR( k)r Ji ( k), i 0 (...-) k= ø Use these weights, SW i, to compute the weighted square error distortion metrics needed to search the LSP VQ codebooks, as described in the next subsection. -

37 0... Vector Quantizing the LSP Frequencies In the LSP VQ algorithm, the 0-dimensional LSP vector is partitioned into five - dimensional subvectors. Each of these -dimensional subvectors is quantized by a VQ, whose codebooks vary in size. Define w i as the ith LSP frequency and wq i as the quantized ith LSP frequency. The VQ codebook values are given in tables in... Define L k (i,j) as the jth element of the kth vector in the ith VQ codebook. For example, L (,) is the first element of the rd vector in codebook, shown in Table...- as 0.. The vectors in the vector quantizer codebooks are differential vectors; i.e., the VQ codebooks contain possible values for the quantized differences in the LSP frequencies, given by Dw i = w i -w i-. The five subvectors are quantized sequentially in the following manner. The first VQ codebook contains possible quantized values for Dw = w -w 0 = w and Dw =w -w. The best vector in the first codebook is selected as the vector which minimizes the sensitivity weighted error between the quantized and unquantized LSP frequencies in the first subvector, which is computed by error = SW ( w - wq ) + SW ( w - wq ) ( ) + SW ( w -( Dwq +Dwq )) = SW w -( Dwq ) ( ( ( ))) + SW w - L k, = SW w - L k, ( ( ( )+ L k (, ) )) (...-) 0 This error function is computed for each of the codevectors in the first LSP VQ codebook (i.e., 0 k < ). The codevector which results in the minimum error is selected, and the - bit LSPV transmission code is set equal to the index of this codevector. Define the index of the best vector for the ith codebook as kbst(i). Once kbst() has been determined, the first two quantized LSP frequencies can be reconstructed from the first VQ codebook as wq =Dwq = L kbst( ) (,) wq =Dwq +Dwq = L kbst( ) (,)+ L kbst( ) (, ) (...-) 0 The remaining subvectors are quantized sequentially in a similar manner. The ith VQ codebook contains possible quantized values for Dw i- = w i- -w i- and Dw i = w i -w i-. The best vector in the ith codebook is selected as the vector which minimizes the sensitivity weighted error between the quantized and unquantized LSP frequencies in the ith subvector, computed by error = SW i- ( w i- - wq i- ) + SW i ( w i - wq i ) ( ) + SW i ( w i - ( wq i- +Dwq i- +Dwq i )) ( ( ( ))) + SW i ( w i - ( wq i- + L k ( i,)+ L k ( i,) )) = SW i- w i- - ( wq i- +Dwq i- ) = SW i- w i- - wq i- + L k i, (...-) -0

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems

GPP C.S00-0 Version.0 Date: June, 00 Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option for Spread Spectrum Systems COPYRIGHT GPP and its Organizational Partners claim