|Talks|

Demographic bias in social media language analysis: a case study of African-American English

Visiting speaker
Past Talk
Brendan O'Connor
Assistant professor in the College of Information and Computer Sciences at the U Mass, Amherst
Dec 14, 2016
12:00 pm
Dec 14, 2016
12:00 pm
In-person
4 Thomas More St
London E1W 1YW, UK
The Roux Institute
Room
100 Fore Street
Portland, ME 04101
Network Science Institute
2nd floor
Network Science Institute
11th floor
177 Huntington Ave
Boston, MA 02115
Network Science Institute
2nd floor
Room
58 St Katharine's Way
London E1W 1LP, UK

Talk recording

What can text analysis tell us about society? Corpora of news, social media, and historical documents record events, beliefs, and culture. Natural language processing and machine learning methods hold great promise to better explore this type of data. At the same time, our current NLP methods are confounded by social variables -- does NLP fairly analyze language from different social groups? We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter, through a demographically supervised model to identify AAE-like language  associated with geo-located messages.  We verify that this language follows well-known AAE linguistic phenomena -- and furthermore, existing tools like language identification, part-of-speech tagging, and dependency parsing fail on this AAE-like language more often than text associated with white speakers.  We leverage our model to fix racial bias in some of these tools, and discuss future implications for fairness and artificial intelligence.

About the speaker
Brendan O'Connor (http://brenocon.com/) is an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst. What can statistical text analysis tell us about society? Prof. O'Connor works in computational social science, developing natural language processing, machine learning, and user interfaces to help scientific investigation about political and social trends; for example, analyzing opinions and slang in Twitter, censorship in Chinese microblogs, and and political events reported in the news. His work has been featured in the New York Times and the Wall Street Journal. He received his PhD in 2014 from Carnegie Mellon University's Machine Learning Department, advised by Noah Smith, and has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and an intern with the Facebook Data Science team. Before graduate school, he worked on crowdsourcing at CrowdFlower / Dolores Labs, and natural language search at Powerset. He holds a BS/MS in Symbolic Systems from Stanford University.
Share this page:
Dec 14, 2016