TY - JOUR
T1 - Comparative Study of GenAI (ChatGPT) vs. Human in Generating Multiple Choice Questions Based on the PIRLS Reading Assessment Framework
AU - Lam, Yu Yan
AU - Chu, Samuel Kai Wah
AU - Ong, Elsie Li Chen
AU - Suen, Winnie Wing Lam
AU - Xu, Lingran
AU - Lam, Lavender Chin Lui
AU - Wong, Scarlett Man Yu
N1 - Publisher Copyright:
87 Annual Meeting of the Association for Information Science & Technology | Oct. 25 – 29, 2024 | Calgary, AB, Canada.
PY - 2024/10
Y1 - 2024/10
N2 - Human-generated multiple-choice questions (MCQs) are commonly used to ensure objective evaluation in education. However, generating high-quality questions is difficult and time-consuming. Generative artificial intelligence (GenAI) has emerged as an automated approach for question generation, but challenges remain in terms of biases and diversity in training data. This study aims to compare the quality of GenAI-generated MCQs with humans-created ones. In Part 1 of this study, 16 MCQs were created by humans and GenAI individually with alignment to the Progress in International Reading Literacy Study (PIRLS) assessment framework. In Part 2, the quality of MCQs generated was assessed based on the clarity, appropriateness, suitability, and alignment to PIRLS by four assessors. Wilcoxon rank sum tests were conducted to compare GenAI versus humans generated MCQs. The findings highlight GenAI's potential as it was difficult to differentiate from human created questions and offer recommendations for integrating AI technology for the future.
AB - Human-generated multiple-choice questions (MCQs) are commonly used to ensure objective evaluation in education. However, generating high-quality questions is difficult and time-consuming. Generative artificial intelligence (GenAI) has emerged as an automated approach for question generation, but challenges remain in terms of biases and diversity in training data. This study aims to compare the quality of GenAI-generated MCQs with humans-created ones. In Part 1 of this study, 16 MCQs were created by humans and GenAI individually with alignment to the Progress in International Reading Literacy Study (PIRLS) assessment framework. In Part 2, the quality of MCQs generated was assessed based on the clarity, appropriateness, suitability, and alignment to PIRLS by four assessors. Wilcoxon rank sum tests were conducted to compare GenAI versus humans generated MCQs. The findings highlight GenAI's potential as it was difficult to differentiate from human created questions and offer recommendations for integrating AI technology for the future.
KW - GenAI
KW - PIRLS
KW - Reading
KW - question assessment
KW - question creation
UR - http://www.scopus.com/inward/record.url?scp=85206810074&partnerID=8YFLogxK
U2 - 10.1002/pra2.1054
DO - 10.1002/pra2.1054
M3 - Article
AN - SCOPUS:85206810074
VL - 61
SP - 537
EP - 540
JO - Proceedings of the Association for Information Science and Technology
JF - Proceedings of the Association for Information Science and Technology
IS - 1
ER -