TY - JOUR
T1 - Automatic classification of Chinese programming MOOC reviews using fine-tuned BERTs and GPT-augmented data
AU - Chen, Xieling
AU - Xie, Haoran
AU - Zou, Di
AU - Xu, Lingling
AU - Wang, Fu Lee
N1 - Publisher Copyright:
© (2025), (International Forum of Educational Technology and Society). All rights reserved.
PY - 2025
Y1 - 2025
N2 - In massive open online course (MOOC) environments, computer-based analysis of course reviews enables instructors and course designers to develop intervention strategies and improve instruction to support learners’ learning. This study aimed to automatically and effectively identify learners’ concerned topics within their written reviews. First, we examined the distribution of topics in 13,660 reviews related to a Chinese programming MOOC and identified “instructional skills,” “perceived course value,” “instructor characteristics,” and “perceived course difficulty” as primary concerns among learners. Second, we proposed a GPTaug-BERT model that integrates fine-tuned bidirectional encoder representations from Transformers (BERT) models with augmented data generated using generative pre-trained Transformers (GPT) and applied it to classify learners’ concerned topics automatically. Results showed that compared with machine learning and other deep learning architectures, the GPTaug-BERT model improved the F1 scores of the MOOC review topic recognition task by 7%. Third, we compared the effectiveness of the GPTaug-BERT model with the BERT-Chinese model in distinguishing between topics, showing that the GPTaug-BERT model achieved better performance with an accuracy of above 67% across all categories even for “online programming tools,” “feedback and problemsolving,” and “course structure” that were largely misclassified by the BERT-Chinese model. Findings offer insights into the effectiveness of combining fine-tuned BERT models with GPT-augmented data for facilitating accurate topic identification from MOOC reviews.
AB - In massive open online course (MOOC) environments, computer-based analysis of course reviews enables instructors and course designers to develop intervention strategies and improve instruction to support learners’ learning. This study aimed to automatically and effectively identify learners’ concerned topics within their written reviews. First, we examined the distribution of topics in 13,660 reviews related to a Chinese programming MOOC and identified “instructional skills,” “perceived course value,” “instructor characteristics,” and “perceived course difficulty” as primary concerns among learners. Second, we proposed a GPTaug-BERT model that integrates fine-tuned bidirectional encoder representations from Transformers (BERT) models with augmented data generated using generative pre-trained Transformers (GPT) and applied it to classify learners’ concerned topics automatically. Results showed that compared with machine learning and other deep learning architectures, the GPTaug-BERT model improved the F1 scores of the MOOC review topic recognition task by 7%. Third, we compared the effectiveness of the GPTaug-BERT model with the BERT-Chinese model in distinguishing between topics, showing that the GPTaug-BERT model achieved better performance with an accuracy of above 67% across all categories even for “online programming tools,” “feedback and problemsolving,” and “course structure” that were largely misclassified by the BERT-Chinese model. Findings offer insights into the effectiveness of combining fine-tuned BERT models with GPT-augmented data for facilitating accurate topic identification from MOOC reviews.
KW - BERT
KW - Data augmentation
KW - GPT
KW - Massive open online courses
KW - Multilabel classification
UR - http://www.scopus.com/inward/record.url?scp=85218262598&partnerID=8YFLogxK
U2 - 10.30191/ETS.202501_28(1).TP01
DO - 10.30191/ETS.202501_28(1).TP01
M3 - Article
AN - SCOPUS:85218262598
SN - 1176-3647
VL - 28
SP - 230
EP - 249
JO - Educational Technology and Society
JF - Educational Technology and Society
IS - 1
ER -