Similar documents
Posterior vs. Parameter Sparsity in Latent Variable Models Supplementary Material

Part-of-Speech Tagging + Neural Networks CS 287

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models

HMM and Part of Speech Tagging. Adam Meyers New York University

CS 545 Lecture XVI: Parsing

LECTURER: BURCU CAN Spring

Hidden Markov Models (HMMs)

Neural POS-Tagging with Julia

10/17/04. Today s Main Points

Natural Language Processing

Parsing with Context-Free Grammars

Probabilistic Graphical Models

Degree in Mathematics

CSE 517 Natural Language Processing Winter2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

NLP Programming Tutorial 11 - The Structured Perceptron

Hidden Markov Models, Part 1. Steven Bedrick CS/EE 5/655, 10/22/14

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Time Zones - KET Grammar

Applied Natural Language Processing

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

Part-of-Speech Tagging

CS388: Natural Language Processing Lecture 4: Sequence Models I

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Statistical methods in NLP, lecture 7 Tagging and parsing

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Natural Language Processing

Sequence Labeling: HMMs & Structured Perceptron

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Lecture 6: Part-of-speech tagging

Part-of-Speech Tagging

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

Language Processing with Perl and Prolog

Posterior Sparsity in Unsupervised Dependency Parsing

Sequential Data Modeling - The Structured Perceptron

Probabilistic Context Free Grammars. Many slides from Michael Collins

1 Words and Tokens (8 points)

Lecture 13: Structured Prediction

Using Part-of-Speech Information for Transfer in Text Classification

Extracting Information from Text

We would like to describe this population. Central tendency (mean) Variability (standard deviation) ( X X ) 2 N

CS 6120/CS4120: Natural Language Processing

Terry Gaasterland Scripps Institution of Oceanography University of California San Diego, La Jolla, CA,

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Hidden Markov Models

Boosting Applied to Tagging and PP Attachment

Automated Biomedical Text Fragmentation. in support of biomedical sentence fragment classification

Constituency Parsing

NLTK tagging. Basic tagging. Tagged corpora. POS tagging. NLTK tagging L435/L555. Dept. of Linguistics, Indiana University Fall / 18

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

with Local Dependencies

Analyzing the Errors of Unsupervised Learning

CS395T: Structured Models for NLP Lecture 5: Sequence Models II

Lecture 12: Algorithms for HMMs

Sentence-level processing

COMP (Fall 2017) Natural Language Processing (with deep learning and connections to vision/robotics)

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

LECTURER: BURCU CAN Spring

Lecture 12: Algorithms for HMMs

A Context-Free Grammar

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Lecture 9: Hidden Markov Model

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Maxent Models and Discriminative Estimation

SAT - KEY TO ABBREVIATIONS

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Lecture 7: Sequence Labeling

Multi-Component Word Sense Disambiguation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

CSE 490 U Natural Language Processing Spring 2016

Hidden Markov Models. Algorithms for NLP September 25, 2014

CS460/626 : Natural Language

Spectral Unsupervised Parsing with Additive Tree Metrics

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Passing-Bablok Regression for Method Comparison

Parts-of-Speech (English) Statistical NLP Spring Part-of-Speech Ambiguity. Why POS Tagging? Classic Solution: HMMs. Lecture 6: POS / Phrase MT

Natural Language Processing

Natural Language Processing

Structured language modeling

Probabilistic Context-free Grammars

Lecture 7: Introduction to syntax-based MT

Linking Theorems for Tree Transducers

Empirical methods in NLP

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Syntax-Based Decoding

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

READING Skill 1.2: Using figurative expressions...7. Skill 2.1: Identify explicit and implicit main ideas...9

Effectiveness of complex index terms in information retrieval

Natural Language Processing (CSE 517): Sequence Models

Chapter (3) Describing Data Numerical Measures Examples

Exercise 1: Basics of probability calculus

CSE 447 / 547 Natural Language Processing Winter 2018

Today. Finish word sense disambigua1on. Midterm Review

F M U Total. Total registrants at 31/12/2014. Profession AS 2, ,574 BS 15,044 7, ,498 CH 9,471 3, ,932

Introduction to Probabilistic Graphical Models

Transcription:

> > > >

< 0.05 θ = 1.96 = 1.64 = 1.66 = 0.96 = 0.82

Geographical distribution of English tweets (gender-induced data) Proportion of gendered tweets in English, May 2016 1 Contexts of the Present Research 70 2 Data Collection and Processing 0.158 0.201 Data Collection 0.045 Gender Disambiguation 0.042 0.264 PoS Tagging Proportion 0.131 0.131 65 3 Analysis Language Profile 0.131 0.131 0.131 0.131 0.0 0.131 0.113 0.069 0.038 0.131 0.1 0.145 0.2 0.3 0.228 Correlation of Grammatical Features and Gender 0.053 0.114 0.178 0.116 0.166 0.131 Principal Components Analysis 0.106 0.129 0.167 0.182 60 0.156 0.365 0.29 0.222 0.208 4 Summary and Conclusion 0.07 0.05 0.0750.201 0.287 0.366 0.1 0.093 20 10 0 10 0.113 0.085 0.4 0.5 0.171 0.167 0.084 0.081 0.089 0.264 0.056 0.129 0.053 0.134 0.097 0.388 0.29 0.2880.165 0.226 0.347 0.324 0.129 0.286 0.373 0.243 0.141 0.42 55 0.165 0.042 0.077 0.094 0.122 0.0570.061 0.102 0.151 0.223 0.153 0.091 0.102 0.156 Feature Dispersion 0.059 0.168 20 30 I Many tweets from Denmark and Norway are in English - from rural Sweden, Finland or Iceland less so (Iceland values averaged across all provinces) 16.9.27 Coats Grammar and Gender Nordic Twitter English 11/32

T test statistic 2 0 2 4 p value = 0.05 p value = 0.05 Feature more male Feature more female '' LRB RRB,. : CC CD DT HT IN JJ JJR JJS MD NN NNP NNS PRP PRP$ RB RP TO UH USR VB VBD VBG VBN VBP VBZ WDT WP WRB Feature

Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 5.4 6.95 Median = 3.07 5.71 Std. dev = 7.35 7.71 t test p value = 0 Cohen's d = 0.2 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m 0 5 10 15 20 Percent period Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 4.4 4.94 Median = 3.45 4.35 Std. dev = 5.18 5.32 ffffffffff t test p value = 0.002 fffffffffffffffffffffffffffff Cohen's d = 0.1 fffffffffffffffffffffffffffffffffffffffffffffffffffffffff fm 0 5 10 15 20 Percent determiner Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 6.61 7.17 Median = 6.25 6.93 Std. dev = 6.17 6.35 t test p value = 0.007 Cohen's d = 0.09 f mf 0 5 10 15 20 Percent preposition Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 12.32 13.78 Median = 6.67 8 Std. dev = 16.51 16.61 t test p value = 0.009 Cohen's d = 0.09 mmmm fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m 0 5 10 15 20 Percent proper noun Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 5.83 4.28 m mmm Median = 3.85 2.05 m Std. dev = 7.52 5.77 t test p value = 0 Cohen's d = 0.24 fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m 0 5 10 15 20 Percent personal pronoun Proportion of all users 0.00 0.05 0.10 0.15 m mmmm females males mmmmmm m mm m f fffffff f mm female male m Mean = 3.81 3.25 fffff mm Median = 0 0 mf mmmm fff Std. dev = 6.02 5.09 t test p value = 0.004 f Cohen's d = 0.1 fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m 0 5 10 15 20 Percent adverb Proportion of all users 0.00 0.05 0.10 0.15 mm mmm females m mm males m m m m m female male m Mean = 7.16 4.84 mm Median = 1.65 0 Std. dev = 12.72 9.7 f ffffffffffffff m mmm t test p value = 0 f Cohen's d = 0.21 m mmmmmmmmmmmmm fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m Proportion of all users 0.00 0.05 0.10 0.15 females males female male Mean = 6.97 8.24 Median = 1.13 3.85 Std. dev = 11.15 12.31 t test p value = 0.001 Cohen's d = 0.11 mmmm mmmmmmmmmmmmmmmmmmm ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff m Proportion of all users 0.00 0.05 0.10 0.15 mf f f m ff females m mf males ff m m mf m m ff female male mm Mean = 2.6 1.92 f m fff Median = 0 0 mm Std. dev = 4.68 3.69 f t test p value = 0 Cohen's d = 0.17 fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff fm 0 5 10 15 20 Percent interjection 0 5 10 15 20 Percent username 0 5 10 15 20 Percent verb, non 3rd person singular present

2 PCA of Gendered Subcorpora, Components 1 and 2 no.m PC2, Proportion of Variance = 11.91 % 1 0 1 fi.m sv.m da.m da.f fi.f sv.f no.f is.m is.f 1 0 1 PC1, Proportion of Variance = 58.92 %

0.04 0.02 no.f is.f sv.f fi.f PC1 0.00 0.02 da.f da.m no.m sv.m fi.m 0.04 is.m 40 50 60 70 Frequency per 1000 tokens Period (.?!)

0.04 0.02 no.f fi.f is.f sv.f PC1 0.00 0.02 da.f sv.m da.m no.m fi.m 0.04 is.m 10 20 30 Frequency per 1000 tokens Number

0.04 0.02 no.f is.f sv.f fi.f PC1 0.00 0.02 sv.m no.m da.f da.m fi.m 0.04 is.m 70 80 90 100 110 Frequency per 1000 tokens Proper noun

0.04 0.02 fi.f sv.f no.f is.f PC1 0.00 0.02 no.m da.m sv.m da.f fi.m 0.04 is.m 50 60 70 80 90 Frequency per 1000 tokens Personal pronoun

0.04 0.02 fi.f no.f is.f sv.f PC1 0.00 0.02 da.m no.m sv.m da.f fi.m 0.04 is.m 9 12 15 18 21 Frequency per 1000 tokens Possessive pronoun

0.04 0.02 is.f no.f sv.f fi.f PC1 0.00 0.02 no.m da.f da.msv.m fi.m 0.04 is.m 35 40 45 50 Frequency per 1000 tokens Adverb

0.04 0.02 is.f fi.f no.f sv.f PC1 0.00 0.02 sv.m da.m no.m da.f fi.m 0.04 is.m 30 40 50 Frequency per 1000 tokens Interjection

0.04 0.02 sv.f fi.f no.f is.f PC1 0.00 0.02 da.m sv.m no.m da.f fi.m 0.04 is.m 35 40 45 Frequency per 1000 tokens Verb, base form

0.04 0.02 no.f is.f sv.f fi.f PC1 0.00 0.02 da.f da.m no.m sv.m fi.m 0.04 is.m 5.0 5.5 6.0 6.5 7.0 7.5 Frequency per 1000 tokens Verb, past participle

0.04 0.02 fi.f is.f sv.f no.f PC1 0.00 0.02 da.m no.m sv.m da.f fi.m 0.04 is.m 20 25 30 35 Frequency per 1000 tokens Verb, non 3rd person singular present

> > > >