What can text analysis tell us about society? Corpora of news, social media, and historical documents record events, beliefs, and culture. Natural language processing and machine learning methods hold great promise to better explore this type of data. At the same time, our current NLP methods are confounded by social variables -- does NLP fairly analyze language from different social groups? We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter, through a demographically supervised model to identify AAE-like language associated with geo-located messages. We verify that this language follows well-known AAE linguistic phenomena -- and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers. We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.
Brendan O'Connor (http://brenocon.com/) is an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst. What can statistical text analysis tell us about society? Prof. O'Connor works in computational social science, developing natural language processing, machine learning, and user interfaces to help scientific investigation about political and social trends; for example, analyzing opinions and slang in Twitter, censorship in Chinese microblogs, and and political events reported in the news. His work has been featured in the New York Times and the Wall Street Journal. He received his PhD in 2014 from Carnegie Mellon University's Machine Learning Department, advised by Noah Smith, and has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and an intern with the Facebook Data Science team. Before graduate school, he worked on crowdsourcing at CrowdFlower / Dolores Labs, and natural language search at Powerset. He holds a BS/MS in Symbolic Systems from Stanford University.