Nders at NTCIR-13 Short Text Conversation

Size: px

Start display at page:

Download "Nders at NTCIR-13 Short Text Conversation"

Gladys Snow
5 years ago
Views:

1 Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge Xu NetDragon Websoft Inc. Dec. 2017

2 System Architecture Figure 1: System Architecture 1

3 Preprocessing Traditional-Simplified Chinese conversion Convert Full-width characters into half-width ones Word segmentation (PKU standard) Replace number, time, url with token <_NUM>, <_TIME>, <_URL> respectively Filter meaningless words and special symbols 2

4 Short Text ID Raw Text Without T-S Conversion With T-S Conversion Clean Result test-post 去到美國, 还是吃中餐! 宮保雞丁家的感覺 ~ Go to the USA, still eat Chinese food, Kung Pao Chicken, feeling like at home 去到美國, 还是吃中餐! 宮保雞丁家的感覺去到美国, 还是吃中餐! 宫保鸡丁家的感觉去到美国还是吃中餐宫保鸡丁家的感觉 Short Text ID Raw Text test-post 汶川大地震 9 周年 :29 个让人泪流满面的瞬间 9th Anniversary of Wenchuan Earthquake: 29 moments making people tearful Without token replacement 汶川大地震 9 周年 : 29 个让人泪流满面的瞬间 With token replacement 汶川大地震 < NUM> 周年 : < NUM> 个让人泪流满面的瞬间 Clean Result 汶川大地震 < NUM> 周年 < NUM> 个让人泪流满面的瞬间

5 Similarity Features TF-IDF LSA (Latent Semantic Analysis) LDA (Latent Dirichlet Allocation) Word2Vec (skip-gram) LSTM-Sen2Vec We combine each post with its corresponding comments to be a document, then train LSA and LDA models on these documents. 3

Cell Mikolov, Toma s. Statistical Language Models Based on Neural Networks. Ph.D.

6 LSTM f t = σ(w f [h t 1, x t ] + b f ) (1) i t = σ(w i [h t 1, x t ] + b i ) (2) C t = tanh(w C [h t 1, x t ] + b C ) (3) C t = f t C t 1 + i t C t (4) o t = σ(w o [h t 1, x t ] + b o ) (5) h t = o t tanh(c t ) (6) Figure 2: The LSTM Cell Mikolov, Toma s. Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology.(2012) Zaremba, Wojciech, I. Sutskever, and O. Vinyals. Recurrent Neural Network Regularization. Eprint Arxiv (2014). 4

7 Attention weight Figure 3: Unidirectional weight distribution Figure 4: bidirectional weight distribution 5

8 LSTM-Sen2Vec Figure 5: The Unidirectional LSTM Figure 6: The Traditional Bidirectional LSTM 6

9 LSTM-Sen2Vec Figure 7: The Modified Bidirectional LSTM 7

10 Candidates Generation Similar Posts Score 1 q,p(q, p) = Sim LDA (q, p) Sim W2V (q, p) Sim LSTM (q, p) (7) Score 2 q,p(q, p) = Sim LSA (q, p) Sim W2V (q, p) Sim LSTM (q, p) (8) Comment Candidates Score 1 q,c(q, c) = Sim LSA (q, c) Sim W2V (q, c) (9) Score 2 q,c(q, c) = Sim LDA (q, c) Sim W2V (q, c) (10) 8

11 Ranking TextRank (Words as vertices) Pattern-IDF Pattern-IDF + TextRank (Sentences as vertices) 9

12 TextRank - A graph-based ranking model Formally, let G = (V; E) be a undirected graph with the set of vertices V and and set of edges E, where E is a subset of V V. For a given V i, let link(v i ) be the set of vertices that linked with it. The score of a vertex Vi is define as follow: WS(V i ) = (1 d) + d j link(v i ) Where d is a damping factor 1 that is usually set to w ij WS(V j ) (11) 1 Brin, Sergey, and L. Page. The anatomy of a large-scale hypertextual Web search engine. International Conference on World Wide Web Elsevier Science Publishers B. V. 1998:

13 TextRank - Vertices and Edges Vertices: each unique word in candidates Edges: a co-occurrence relation Weighted by: word2vec similarity between two words and the number of their cooccurrences 11

14 TextRank - Calculate Iteratively For N candidates, k words in total, we construct k k matrix M. M ij = cnt sim(d i, D j ). Then we compute iteratively (1 d)/k M 11 M 12 M M 1k (1 d)/k R(t + 1) = d M 21 M 22 M M 2k R(t) (1 d)/k M k1 M k2 M k3... M kk Stop when R(t + 1) R(t) < ϵ, ϵ = Here, cnt refers to the number of co-ocurrences within a sentence for D i and D j. 12

15 TextRank - Ranking Since we get the score R(D i ) for each word D i in candidates, the score for each comment candidate c is calculated as: D Rank TextRank (c) = i c R(D i) len(c) (12) Here, len(c) refers to the number of words in comment c. 13

16 Pattern-IDF For word D i (minor word) in corresponding comment given word D j (major word) in the post, we define (D j,d i ) as a pattern. Inspired by the IDF, we calculate the Pattern-IDF as: PI(D i D j ) = 1/ log 2 count c (D i ) count p (D j ) count pair (D i, D j ) (13) Here count c refers to the number of occurrence in comments, count p in posts, count pair in post-comment pair. The PI whose count pair (D i, D j ) less than 3 are eliminated. 14

17 Pattern-IDF Let X = count c(d i ) count p (D j ) count pair (D i,d j ), then X [1, ). Figure 8: log(x) Figure 9: 1/log(x) 15

18 PI - Example Table 1: The example of Pattern-IDF MajorWord MinorWord PI (China Mobile) (connect) cmcc (charges) (business hall ) (roamimg) (me) (be) (of) Table 2: The entropy of Pattern-IDF for each Major Word MajorWord H (eye disease) (harvest year) (plasma) (vertebrate) gouache painting (now) (what) (be) PI norm (D i D j ) = PI(D i D j ) n i=1 PI(D i D j ) (14) n H(D j ) = PI norm (D i D j ) log 2 PI norm (D i D j ) (15) i=1 16

19 PI - Ranking For each comment c in candidates, given a query (new post) q, we calculate the score by PI as follow: D j q D i c Score PI (q, c) = PI(D i D j ) (16) len(c) len(q) Then we define rank score as follow: Rank PI = (1 + Score PI(q, c) max Score PI (q, c) ) Sim W2V(q, c) Sim LSA (q, c) (17) 17

20 TextRank + Pattern-IDF In this method, We add each comment sentence in candidates as a vertex in the graph and use sentence Word2Vec similarity as edges between vertices in the graph. For N candidates, we construct N N matrix M. M ij = Sim w2v (candidate i, candidate j ). At time t = 0, We initiate a N-dimension vector P, here N is the number of comment candidates. And each entry of P is defined as the score of Pattern-IDF between the query (new post) q and corresponding comment c i in candidates: P i = Score PI (q, c i ) (18) 18

21 TextRank + Pattern-IDF Then we compute iteratively (1 d)/n M 11 M 12 M M 1N (1 d)/n R(t + 1) = d M 21 M 22 M M 2N R(t) (1 d)/n M N1 M N2 M N3... M NN Stop when R(t + 1) R(t) < ϵ, ϵ = 10 7 Finally, we get the score P i for each comment in candidates. 19

22 Experiment Nders-C-R5: Nders-C-R4: Nders-C-R3: Nders-C-R2: Nders-C-R1: LDA + Word2Vec + LSTM-Sen2Vec LSA + Word2Vec + LSTM-Sen2Vec R4 + TextRank (Words as vertices) R4 + Pattern-IDF R4 + Pattern-IDF + TextRank (Sentences as vertices) 20

23 Official Result Table 3: The official results of five runs for Nders team Run Mean Mean P+ Mean Nders-C-R Nders-C-R Nders-C-R Nders-C-R Nders-C-R R2 vs. R4 0.77% 2.98% 1.26% 21

24 Questions? 21

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent