Unsupervised Vocabulary Induction

Similar documents
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

6.6 Sequence Alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Algorithms in Bioinformatics

Lecture 4: September 19

Lecture 2: Pairwise Alignment. CG Ron Shamir

Natural Language Processing SoSe Words and Language Model

Analysis and Design of Algorithms Dynamic Programming

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Large-Scale Genomic Surveys

Prenominal Modifier Ordering via MSA. Alignment

Chapter 3: Basics of Language Modeling

DM-Group Meeting. Subhodip Biswas 10/16/2014

Bioinformatics and BLAST

Temporal Modeling and Basic Speech Recognition

Pairwise sequence alignment

10/17/04. Today s Main Points

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization

Bio nformatics. Lecture 3. Saad Mneimneh

Design and Implementation of Speech Recognition Systems

1. Markov models. 1.1 Markov-chain

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027

Phrase Finding; Stream-and-Sort vs Request-and-Answer. William W. Cohen

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Motivating the need for optimal sequence alignments...

String Matching Problem

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Hidden Markov Model and Speech Recognition

When Dictionary Learning Meets Classification

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Chapter 3: Basics of Language Modelling

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

GIS Institute Center for Geographic Analysis

Wavelet Transform in Speech Segmentation

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Independent Component Analysis and Unsupervised Learning

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

On Pattern Matching With Swaps

The Noisy Channel Model and Markov Models

Sequence analysis and Genomics

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

CSC321 Lecture 7 Neural language models

More Smoothing, Tuning, and Evaluation

Algorithm Design and Analysis

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Exemplar-based voice conversion using non-negative spectrogram deconvolution

CS229 Project: Musical Alignment Discovery

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Pair Hidden Markov Models

Pairwise & Multiple sequence alignments

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

Hidden Markov Modelling

Practical considerations of working with sequencing data

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

ECE521 Lectures 9 Fully Connected Neural Networks

Detection-Based Speech Recognition with Sparse Point Process Models

What is semi-supervised learning?

Soundex distance metric

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

EECS490: Digital Image Processing. Lecture #26

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Information Extraction from Text

Text Mining. March 3, March 3, / 49

KSU Team s System and Experience at the NTCIR-11 RITE-VAL Task

CS230: Lecture 10 Sequence models II

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Lecture 5 Neural models for NLP

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

Phrase-Based Statistical Machine Translation with Pivot Languages

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Presented By: Omer Shmueli and Sivan Niv

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

Modeling Norms of Turn-Taking in Multi-Party Conversation

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

Hidden Markov Models Hamid R. Rabiee

Latent Dirichlet Allocation Introduction/Overview

Deep Learning for Speech Recognition. Hung-yi Lee

Intelligent Systems (AI-2)

Hidden Markov Models

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Paterson Public Schools

Latent Semantic Analysis. Hongning Wang

Multiple Sequence Alignment (MAS)

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Transcription:

Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After only 2 minutes of exposure, infants can distinguish words from non-words (e.g., pabiku vs. kumali) Today: Unsupervised Vocabulary Induction Vocabulary Induction from Unsegmented Text Vocabulary Induction from Speech Signal Sequence Alignment Algorithms Vocabulary Induction Task: Unsupervised learning of word boundary segmentation Simple: Ourenemiesareinnovativeandresourceful,andsoarewe. Theyneverstopthinkingaboutnewwaystoharmourcountry andourpeople,andneitherdowe. More ambitious:

Word Segmentation (Ando&Lee, 2000) Key idea: for each candidate boundary, compare the frequency of the n-grams adjacent to the proposed boundary with the frequency of the n-grams that straddle it. S? S t1 1 2 T I N G E V I D t2 t 3 For N = 4, consider the 6 questions of the form: Is #(s i ) #(t j )?, where #(x) is the number of occurrences of x Example: Is TING more frequent in the corpus than INGE? Algorithm for Word Segmentation (Cont.) Place boundary at all locations l such that either: l is a local maximum: v N (l) > v N (l 1) and v N (l) > v N (l + 1) v N (l) t, a threshold parameter V (k) N A B C D W X Y Z t s n 1 s n 2 t n j I (y, z) Algorithm for Word Segmentation non-straddling n-grams to the left of location k non-straddling n-grams to the right of location k straddling n-gram with j characters to the right of location k indicator function that is 1 when y z, and 0 otherwise. 1. Calculate the fraction of affirmative answers for each n N: v n (k) = 1 2 (n 1) 2 n 1 i=1 j=1 I (#(s n i ), #(t n j )) 2. Average the contributions of each n-gram order v N (k) = 1 v n (k) N n N Experimental Framework Corpus: 150 megabytes of 1993 Nikkei newswire Manual annotations: 50 sequences for development set (parameter tuning) and 50 sequences for test set Baseline algorithms: Chasen and Juman morphological analyzers (115,000 and 231,000 words)

Evaluation Precision (P): the percentage of proposed brackets that exactly match word-level brackets in the annotation Recall (R): the percentage of word-level annotation brackets that are proposed by the algorithm F = 2 P R (P +R) F = 82% (improvement of 1.38% over Jumann and of 5.39% over Chasen) Today: Unsupervised Vocabulary Induction Vocabulary Induction from Unsegmented Text Vocabulary Induction from Speech Signal Sequence Alignment Algorithms Performance on other datasets Aligning Two Sequences Given two possibly related strings S1 and S2, find the longest common subsequence Cheng & Mitzenmacher Orwell(English) 79.8 Song lyrics (Romaji) 67.6 Goethe (German) 75.2 Verne (French) 72.9 Arrighi (Italian) 73.1

How can We Compute Best Alignment Key Insight: Score is Additive We need a scoring system for ranking alignments Substitution Cost A G T C A 1 0.5-1 -1 G -0.5 1-1 -1 T -1-1 +1-0.5 C -1-1 -0.5 1 Gap (insertion&deletion) Cost Compute best alignment recursively For a given aligned pair (i, j), the best alignment is: Best alignment of S1[1... i] and S2[1... j] + Best alignment of S1[i... n] and S2[j... m] Can We Simply Enumerate All Possible Alignments? Naive enumeration is prohibitively expensive ( ) n + m (m + n)! = 2m+n m (m!) 2 (n m) n=m Enumeration 10 184,756 20 1.4E+11 100 9.00E+58 Alignment using dynamic programming can be done in O(n m) Alignment Matrix Alignment of two sequences can be modeled as a task of finding the path with the highest weight in a matrix Alignment: Corresponding Path: - - P - A W - + + + P + + A + W + +

Global Alignment: Needleman-Wunsch Algorithm To align two strings x, y, we construct a matrix F F(i,j): the score of the best alignment between the initial segment s 1...i of x up to x i and the initial segment y 1...j of y up to y j We compute F recursively: F (0, 0) = 0 F(i 1,j 1) s(xi,yj) F(i 1,j) d F(i,j 1) d F(i,j) Dynamic Programming Formulation We know how to compute the best score The number at the bottom right entry (i.e., F (n, m)) But we need to remember where it came from Pointer to the choice we made at each step Retrace path through the matrix Need to remember all the pointers Time: O(m n) s(x i, y j ) d Dynamic Programming Formulation similarity between x i and y j gap penalty F (i 1, j 1) + s(x i, y j ) F (i, j) = max F (i 1, j) d F (i, j 1) d Boundary conditions: The top row: F (i, 0) = id F (i, 0) represents alignments of prefix x to all gaps in y Local alignment: Smith-Waterman Algorithm Global alignment: find the best match between sequences from one end to the other Local alignment: find the best match between subsequences of two sequences Useful for comparing highly divergent sequences when only local similarity is expected The left column: F (0, j) = jd

Dynamic Programming Formulation 0 F (i 1, j 1) + s(x i, y j ) F (i, j) = max F (i 1, j) d F (i, j 1) d Boundary conditions: F (i, 0) = F (0, j) = 0 Finding the best local alignment Today: Unsupervised Vocabulary Induction Vocabulary Induction from Unsegmented Text Vocabulary Induction from Speech Signal Sequence Alignment Algorithms Find the highest value of F (i, j), and start the traceback from there The traceback ends when a cell with value 0 is found Local vs. Global Alignment Finding Words in Speech SimilarityMatrix GlobalAlignment P -2-1 -1-2 -1-4 -2 A -2-1 5 0 5-3 0 W -3-3 -3-3 -3 15-3 0-8 -16-24 -32-40 -48-56 P -8-2 -9-17 -25-33 -42-49 A -16-10 -3-4 -12-20 -28-36 W -24-18 -11-6 -7-15 -5-13 0 0 0 0 0 0 0 0 LocalAlignment P 0 0 0 0 0 0 0 0 A 0 0 0 5 0 5 0 0 W 0 0 0 0 2 0 20 12 Traditional approached to speech recognition are supervised: Recognizers are trained using a large corpus of speech with corresponding transcripts During the training process, a recognizer is provided with a vocabulary Is it possible to learn vocabulary directly from speech signal?

Vocabulary Induction: Outline Spectral Vectors Spectral vector is a vector where each component is a measure of energy in a particular frequency band We divide acoustic signal (a one dimensional wave form) into short overlapping intervals (25 msec with 15 msec overlap) We convert each overlapping window using Fourier transform Comparing Acoustic Signals Example of Spectral Vectors he too was diagnosed with paranoid schizophrenia 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (sec) were willing to put nash s schizophrenia on record 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 Time (sec)

Comparing Spectral Vectors Computing Local Alignment Divide acoustic signal to word segments based on pauses Compute spectral vectors for each segment Build a distance matrix for each pair of word segments use Euclidean distance to compare between spectral vectors Example of Distance Matrix Clustering Similar Utterance

Examples of Computed Clusters