Improving the Generalizability of Models of Collaborative Discourse

Abstract

We investigated methods to enhance the generalizability of  large language models (LLMs) designed to classify dimensions of collaborative discourse during small group work. Our research utilized five diverse datasets that spanned various grade levels, demographic groups, collaboration settings, and curriculum units. We explored different model training techniques with RoBERTa and Mistral LLMs, including traditional fine-tuning, data augmentation paired with fine-tuning, and prompting. Our findings revealed that traditionally fine-tuning RoBERTa on a single dataset (serving as our baseline) led to overfitting, with the model failing to generalize beyond the training data’s specific curriculum and language patterns. In contrast, fine-tuning RoBERTa with embedding augmented data led to significant improvements  in generalization, as did pairing Mistral embeddings with a support vector machine classifier. However, fine-tuning and few-shot prompting Mistral did not yield similar improvements. Our findings highlight scalable alternatives to the resource-intensive process of curating labeled datasets for each new application, offering practical strategies to enhance model adaptability in diverse educational settings.

Keywords

Generalization, Natural language processing, Collaboration
analytiGeneralization, Natural language processing, Collaboration 
analytics