The Comparison of Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model

Size: px

Start display at page:

Download "The Comparison of Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model"

Brent Chase
5 years ago
Views:

1 The Comparison Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model Diponegoro A.D 1). and Fawwaz Al Maki. W 1) 1) Department Electrical Enginering, University Indonesia, Indonesia Abstract The implementation Vector Quantization (VQ) in the Fish acoustic voice recognition using Hidden Markov Model (HMM) was to reduce the memory capacity and to reduce the computation time. There were three kinds VQ algorithms that implemented in the fish voice recognition namely Traditional K-Means Clustering, LBG (Linde, Buzo, and Gray), and Successive Binary Split. In the vf recognition processing the input fish voice waveform was converted to the descrete signal and atracted to obtain its spectrum characteristic using Mel Frequency Cepstrum Coefficient (MFCC). The vector components fish voice spectrum were quantized using three kind VQ algoritms. The performance these VQ algoritms were examined during fish voice recognition processing by means HMM. Based on the experiment result the Sucessive Binary Split algorithm was the optimum algorithm because its algorithm had the higest accuracy compared to the two other algorithms. During the recognition processing, the Sucessive Binary Split algorithm required the lowest memory capasity and time consumption. Keywords---Vector quantization, HMM, Fish acoustic voice II. INTRODUCTION Every kind Soniferous fishes are able to produce the specific acoustic voice that distinguish them from their species and introduce their behaviour such as courtship behaviour [1], mating behaviour [2], spawning behaviour [3] [4], and reproductive behaviour [5]. During the recognition processing, the wave characteristic the observed fish were compared to the number wave characteristics a number fish voices in a data base. In case the number fishes were so big therefore to search the vector components in data base need the long time computation. To solve such problem, the nearest vector component every spectrals were combined into one value that was called centroid or codeword. The combination several vector components to one value codeword were processed by means VQ algorithm. There were three kind VQ algorithms namely Traditional K-Means Clustering, LBG, and Sucessive Binary Split. From the three VQ algorithms which one was the optimum performances in term the smalest memory capacity, the shortest computation time and also the importance term was to obtain the highest accuracy recognition result. II. VECTOR QUANTIZATION The vector components the extracted fish voice spectrums were mapping from a large vector space to the finite number region space. Each region was called a cluster. In a cluster the vector components were called as the sample points. The nearest-neighbor sample points were quantized to a centroid or a codeword by means VQ quantization (see Fig. 1). The distance between the sample points to its centroid called VQ distortion. Increasing the number sample points caused the distance the VQ distortion became smaller it means that the accuracy became higher. In the certain number sample points (vector component), if the VQ distortion were small then it required the big number centroids. it means that the computation time became longer and the storage capacity became bigger. It also depend on the number attracted waves. The relation between VQ distortion and the acoustic waves were depend on the number extracted waves that were produced from the concerned acoustic wave. If the acoustic waves every kind observed fishes had the big differences each others, the duration time extracted waves were longer than the duration time extracted wave if he acoustic wave every kind fishes had nearly same each others. The method VQ algorithms would determine the performances fish species recognition based on the fishes acoustic voices that were produced. The VQ algorithms were used in this paper K-Means Clustering (Traditional K-Means Clustering), Sucessive Binary Split (Binary Split), dan LBG (Linde, Buzo, and Gray). A. K-Means Clustering algorithm [7] K-Means Clustering algorithm was used the method to built the codewords. The procedure K-Means Clustering algorithm was explained in The flow chart that shown in Fig. 2.

2 C C = + m C m = m C m ( 1+ ε ) ( 1+ ε ) where ε is a spliting parameter (choose ε = 0.01) start Determine initial codebook Fig. 1. VQ processing [6] Start establish new codeword with centorid and cluster Quantize all the training vector Determine initial codeword Determine centroid new cluster cluster vector Compute Distortion (D) Fine codeword Update codeword No no D-D < t m < M Yes End Compute distortion (D) Fig. 3. Flow chart Sucessive Binary Split algorithm D < D End Fig. 2. Flow chart K-Means Clustering algorithm B. Sucessive Binary Split algorithm [7]. In the Binary Split algorithm the initial codebook are set at the random value M. The Sucessive Binary Split algorithm procedure was shown in a flow chart Fig. 3. C.. LBG algorithm [7] [8] The LBG algorithm procedure was shown in Fig. 4. Spliting each current codebook C m according to the rule Fig. 4. Flow chart LBG algorithm [8]

3 III. RECOGNITION PROCESSING In the recognition processing, the extracted wave the observed fishes acoustic voice were determined its characteristics (vector components and HMM parameters) based on the characteristic in data base. The comparison results between the observed fish acoustic voice characteristic and the fish acoustic voice characteristic in data base would be used to recognize the name observed species fish. In the recognition processing, the kind fish that had the highest log-probability value that used to decide the name the observed fish. The block diagram recognition processing was shown in Fig. 5. Fish Acous tic Fish Acous tic Discrete Signal Process Discrete Signal Process VQ VQ Fig. 5. Recognition processing procedure The notation HMM can be writen as followed [9] λ = (A, B, π) (1) where A = a ij = P[q t+1 = j q t = i] is state-transition probability B = b j = P[o t = v k q t = j] is Observation symbol probability distribution. π = {π j }= P[q 1 =i] is the initial state distribution. The observation sequence is given by O = (o 1 o 2... o T ) (2) The staet sequence is given by HMM for training HMM for recog Data base Deci tion q = (q 1 q 2... q T ) (3) The HMM probability (log probability) is given by P(O λ) = Σ P(O q, λ)p(q λ) (4) Where the probability the observation sequence can be writen as P(O q, λ) = b q1 (o 1 ). b q2 (o 2 )... b qt (o T ) (5) And the probability a state sequence q can be writen as P(q λ) = π q1 a q1 q2 a q2 q3... a qt-1 qt (6) IV. EXPERIMENT RESULT The fish species were used in this experiments coonsisted 5 (five) kind fish accoustic voice namely : - Cynoscion regalis drumming - Cynoscion regalis chattering - Conodon nobilis - Opsanus tau - Cynoscion jamaicensis Every 5 (five) kind fishes accoustic voice were segmented into 60 (sixty) burst the extracted wave in a certain time period. The training processing were excecuted for 12 (twelve) times. In this experiment the time period (duration time) the extracted waves were implemented for 3 (three) duration times namely 1) The duration time less than 0.4 second 2) The duration time between 0.6 to 2.3 second 3) The above duration times combined to the random duration time burst The dimension codebook that were applied in this experiment were excecuted for 3 (three) s namely 1) 32 bit codebook 2) 64 bit codebook 3) 128 bit codebook A. The accuracy level performance The experiment performed the accuracy level each VQ algorithms The results were shown in Table I to Table III, TABLE I. The accuracy level ( %) fish voice recognition for Traditional K-Means Clustering algorithm Codebook Accuracy level (%) 0.4 s 2.3 s Comb , ,33 46,67 36, , ,67 26,67 36,67 TABLE II. The accuracy level ( %) fish voice recognition for. Sucessive Binary Split algorithm. Code book Accuracy level (%) 0.4 s 2.3 s Comb 32 46,67 63,33 56, ,67 83, , ,67 From the tables, it could be showed that LBG algoritm was most accurate compared to the two others algorithms for combination burst, the highest codebook and for 10 times cycle.

4 TABLE III. The accuracy level ( %) fish voice recognition for. LBG algorithm. Code book Accuracy level (%) 04 s 2.3 s Comb , , , ,67 86, ,67 46,67 56,67 B. Relative time consumption The time consumption were measured based on the cumpoter time started from entering the data until the results were diplayed completely on the monitor. Relative time calculation results HMM training for each VQ algorithms were shown in Table IV. In the table showed that LBG algorithm consummated the smalest excecution time. TABLE IV. Excecution time each VQ algoritms for codebook and number Baum Welch C. VQ distortion The VQ distortion for several codebooks and number were shown in Table V In the table shows that the VQ distortion became smaller for the bigger codebook and also for the bigger number. TABLE V. VQ distortion for several codebook and number Iteration Code- Book Codebook The relatif Time consumption HMM training processing Trad. K Means Clust. LBG Succ. Binary Split 32 42,05 34,75 37, ,56 51,84 53, ,53 86, ,79 108,14 111,76 VQ distortion Based on the above results, at the same value repetition and at the same duration time, mainly the increasing the codebook would increase the recognition accuracy. Such case happened because the increasing the number codeword in a codebook, the consequences that the distant VQ distortion became smaller. It means that the probability error also became higher. V. CONCLUTION Based on the results LBG algorithm was the smallest excecution time, and also LBG algoritm was most accurate compared to the two others algorithms for combination burst type, the highest codebook and for 10 times. REFERENCES [1] Gerald, J. W., sound production during courtship in six species sunfish, Evolution 25: 75-87, [2] Fine, M. L., Seasonal and geographical variation Matting call oyster toad-fish, Oecologia, 36: 45-47,1978. [3] Philip S Lobel, sound produced by spawning fish, Environmental Biology fishes 33, , 1992 [4] Lugli Marco, Gianni Pavan, Ptrizia Torricelli, Laura Bobbio, Spawning vocalization in male freshwater gobiids, Environmental Biology fishes 43: , [5] Stout J. F., Sound communication during the reproductive behavior Notropis analostanus, Amer. Midle Nat 94: , [6] Batri, Nadim, Robust Spectral Parameter Coding in Speech Recognition, Thesis, Department Electrical Engineering, McGill University, Montreal, Canada, [7] Thomas M Parks, Vector Quantization Codebook Design Using Neural Networks, Air Force Office Scientific Research (AFOSR/JSEP), December [8] Liu, Zhongmin, Yin, Qizhang, Zhang, Weimin, A Speaker Identification and Verification System, EEL6586 Final Project, 2002 [9] Rabiner, L, Juang, Bing Hwang, Fundamentals Speech Recognition, Prentice Hall, Inc., New Jersey, 1993.

6 TABLE IV. Baum Welch Algorithm Accuracy level (%) for berbagai jenis sinyal yang digunakan sebagai masukan B P C 10 23, ,67 26,67 26,67

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed