Oren Tsur
London E1W 1YW, UK
Portland, ME 04101
2nd floor
11th floor
Boston, MA 02115
2nd floor
London E1W 1LP, UK
Talk recording
State-of-the-Art Natural Language Processing (NLP) systems are trained on massive collections of data. Traditionally, NLP models are uni-modal: one form of data, e.g., textual data, is used for training. However, recent trends focus on multimodality, utilizing multiple forms of data in order to improve the system’s performance on classic tasks as well as broadening the capabilities of AI systems. Image and code are the two common modalities that are used in training popular tools such as OpenAI’s GPT and Google's Gemini, among other LLMs..Language, however, is not merely a collection of stand-alone texts, nor texts merely grounded in image or aligned with code. Language is primarily used for communication between speakers in some social settings. The meaning (semantic, pragmatic) of a specific utterance is best understood by interlocutors that share some common ground and are aware of the context in which the communication takes place. In this talk I will demonstrate the benefits of the multi-modal framework through three unique tasks: conversational stance detection, the detection of hate mongers, and through modeling distributed large-scale coordinated campaigns.