Eigenwords a quick intro Lyle Ungar University of Pennsylvania Lyle Ungar, University of Pennsylvania
Represent each word by its context I ate ham You ate cheese You ate context word Word Before Word After ate cheese ham I You ate cheese ham I You ate 0 0 0 1 2 0 1 1 0 0 cheese 1 0 0 0 0 0 0 0 0 0 ham 1 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 1 0 0 0 0 You 0 0 0 0 0 2 0 0 0 0 Hypothesis: words with similar contexts have similar meanings 2
Eigenwords u Project high dimensional context to low dimensional space (SVD/PCA) u Similar words are close in this low dimensional space I ate ham You ate cheese You ate Word Before Word After ate cheese ham I You ate cheese ham I You ate 0 0 0 1 2 0 1 1 0 0 cheese 1 0 0 0 0 0 0 0 0 0 ham 1 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 1 0 0 0 0 You 0 0 0 0 0 2 0 0 0 0 3
Eigenwords as SVD u Left singular vectors are eigenwords l a vector representing each word word embeddings u Right singular vectors are eigentokens l vectors representing the contexts I ate ham You ate cheese You ate Word Before Word After ate cheese ham I You ate cheese ham I You ate 0 0 0 1 2 0 1 1 0 0 cheese 1 0 0 0 0 0 0 0 0 0 ham 1 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 1 0 0 0 0 You 0 0 0 0 0 2 0 0 0 0 4
Similar words are close PC 2-0.3-0.2-0.1 0.0 0.1 0.2 wife miles inches sister brother daughter uncle son pounds acres ather husband bytes mother girlboss tons boy degrees meters barrels woman doctor guy man lawyer farmer teacher citizen stress pressure gravity tension temperature permeability density viscosity -0.2-0.1 0.0 0.1 0.2 0.3 0.4 PC 1 5
Nouns and verbs PC 2-0.2-0.1 0.0 0.1 0.2 0.3 river house word boattruck cat car home dog talk push sleep drink disagree carryeat listen -0.2 0.0 0.2 0.4 PC 1 6
Pronouns us PC 2-0.1 0.0 0.1 0.2 0.3 we i they he she you our them him his her -0.3-0.2-0.1 0.0 0.1 0.2 0.3 PC 1 7
Numbers PC 2-0.3-0.2-0.1 0.0 0.1 0.2 eight three two four five six seven nine one ten 2008 2009 10 2007 2006 2005 2004 2002 2003 2001 1998 1999 1996 1997 2000 1995 79 5 6 8 2 3 4 1-0.4-0.2 0.0 0.2 PC 1 8
Names tom bobjoe mike -0.10-0.05 0.00 0.05 0.10 tricia jennifer lisa liz betsy karenlinda patricia betty nancy susan dorothy dan michael david daniel mary richard maria helenchristopher barbara robert margaret john donald elizabeth william paul charles george joseph thomas -0.2-0.1 0.0 0.1 PC 2 9
Cool math! Useful! u We use these reduced dimensional representations to estimate Hidden Markov Models (HMMs) and other dynamic Bayesian Models (e.g. parsing models) l Current solution methods (EM, Gibbs) are iterative, making it hard to prove properties about their solutions u Spectral (eigenword methods) have clean optimality and sample complexity results 10
Use eigenwords/eigentokens in supervised learning u Predict labels for tokens based on their estimated state vector l Part of speech l Named entity type (person, place, thing ) l Word sense ( meaning ) disambiguation u Use in parsing 11
Word2vec u Word embeddings, often found by deep learning, are very popular now l Often have same performance as the simpler eigenwords 12
Word Sense Disambiguation u Estimate state vector for a word u Similar meanings will be close. l The ships dock in the port. l The port is loaded onto ships and sent to America l The meat is tender. l I have tender feelings for her. l The company will tender an offer. 13