Beyond Utterance: Understanding Group Problem Solving through Discussion Sequences

Zhuoxu Duan, Zhengye Yang, Brooke Foucault Welles, Richard J. Radke
Proceedings of the 27th International Conference on Multimodal Interaction
October 12, 2025

Automatically understanding and facilitating effective group collaboration remains a core challenge across social science and computational research. While prior work has focused on fine-grained social cues or coarse behavioral patterns, understanding the intermediate structure of dialogue—how sequences of utterances (discussion segments) reflect evolving group knowledge—is critical. This paper introduces a novel discussion segmentation framework and taxonomy for modeling collaborative problem-solving (CPS) processes, classifying segments into categories such as “task progress”, “task attempt”, and “grounding”. We collected and annotated over 1,700 multi-modal discussion segments from 21 group discussions, both in-person and online, based on this taxonomy. We further propose a baseline model that integrates audio, visual, and textual signals to classify discussion segments with an average F1 score of 69.3%. Notably, this lightweight expert model achieves performance comparable to, and sometimes exceeding, proprietary state-of-the-art multimodal large language models. These findings highlight the promise of sequence-level discourse analysis for automated facilitation and human-agent collaboration.

Share this page: