Brendan O'Connor
London E1W 1YW, UK
Portland, ME 04101
2nd floor
11th floor
Boston, MA 02115
2nd floor
London E1W 1LP, UK
Talk recording
What can text analysis tell us about society? Corpora of news, social media, and historical documents record events, beliefs, and culture. Natural language processing and machine learning methods hold great promise to better explore this type of data. At the same time, our current NLP methods are confounded by social variables -- does NLP fairly analyze language from different social groups? We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter, through a demographically supervised model to identify AAE-like language associated with geo-located messages. We verify that this language follows well-known AAE linguistic phenomena -- and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers. We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.