The representation of word and sentence

Size: px

Start display at page:

Download "The representation of word and sentence"

Elwin Kelly
5 years ago
Views:

1 2vec Jul 4, 2017

2 Presentation Outline 2vec 1 2 2vec

3 discrete representation taxonomy:wordnet Example:good 2vec

4 Problems 2vec synonyms: adept,expert,good It can t keep up to date It can t accurate similarity

5 Vector representation 2vec One-hot vector [0, 0, 0,...1, 0,...0, 0] take too much space Vectors are orthogonal Hard to compute similarity

6 Dense vector 2vec Represent a vector by its neighbors Example: The cat is running in a room A dog is walking in a bedroom

7 Co-occurrence Matrix 2vec

8 problem 2vec extremely sparse hard to update high dimension

9 A neural probabilistic language model (Bengio et al., 2003) 2vec

10 Presentation Outline 2vec 1 2 2vec

11 Most common methed: 2vec 2vec CBOW one-hot vector for the s around the center : x c m, x c m+1,...x c 1, x c+1,...x c+m v i = V x i.(i = c m,...c + m) ˆv = mean(v) Z = U ˆv ŷ = stmax(z) J(θ) = log P (u c ˆv) = u T c ˆv + log V j=1 exp(ut j ˆv)

12 skip-gram 2vec one-hot vector center x c v c = V x c z = U v c ŷ = stmax(z) J(θ) = 2m j=0 ut c m+j v c + 2mlog V k=1 exp(ut k v c)

13 2vec

14 Presentation Outline 2vec 1 2 2vec

15 2vec 2vec can capture complex linguistic patterns but can t get global co-occurence statistics combines Co-occurrence Matrix and 2vex Q ij = exp(wt i w j) V (from skip-gram) k=1 exp(wt i ŵj) J = i corpus,j context(i) logq ij hard to compute

16 2vec J = V V i=1 j=1 X ijlogq ij (X ij )is from co-occurrence matrix X J = V i=1 X ih(p i, Q i ) Replace cross entropy with Least square: J = ij X i( ˆP ij ˆQ ij ) 2 ( ˆP ij = X ij, ˆQ ij = exp(w T i ŵ j )) X ij may be large J = ij X i(log ˆP ij log ˆQ ij ) 2 = ij X i(w T i ŵ j X ij ) 2 final: J = ij f(x ij)(w T i ŵ j X ij ) 2

17 2vec

18 Presentation Outline 2vec 1 2 2vec

19 evaluate a 2vec Evaluation methods for unsupervised s(tobias Schnabel) Intrinsic: Use vectors as inputs for an elaborate machine learning system King - queen = man -woman bad - worst = good -best fast but unsure Extrinsic: Compute on your task slow but useful

20 Presentation Outline 2vec 1 2 2vec

21 ? 2vec Performance is heavily dependent on the model used for Performance increases with larger corpus sizes: Performance is lower for extremely low as well as for extremely high dimensional vectors.but larger dimensions will lead to better performance. Corpus domain is more important than corpus size. small corpus(< 500M) uses skip-gram, big corpus use CBOW iteration at least 50 dimension

22 Ambiguity 2vec A may have several meanings.like: tie Linear Algebraic Structure Word Senses, with Applications to Polysemy(Sanjeev Arora) tie = α 1 tie 1 + α 2 tie 2 + α 3 tie α i related to frequence tie i Given vector, about 60000, upper bound m,find a set context vectora 1, A 2...such that: v w = w i=1 α wja j + n w at most k α i are nonzero. Just sparse coding(k-svd) Find a set base in vector space, each can be represent by base.

23 Ambiguity 2vec

24 Problems 2vec powerful,strong and Paris are equally distant Word vector will lose the ordering the s and ignore semantics the s.

25 Presentation Outline 2vec 1 2 2vec

26 Doc2vec Distributed Representations s and Documents Distributed Memory Model Paragraph Vectors (PV-DM) The paragraph token can be thought as another 2vec

27 Distributed Bag Words version Paragraph Vector (PV-DBOW) 2vec Combination two metheds works better

28 A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS(Sanjeev Arora,2017) 2vec

29 some other method Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks 2vec MV-RNN s (Matrix-Vector Recursive Neural Networks

30 2vec That s all.thanks.

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background