Authors: Heeryung Choi, Tung Phung, Mengyan Wu, Adish Singla, Christopher Brooks
Abstract: Generative AI tools, such as AI-generated hints, are increasingly integrated into programming education to offer timely, personalized support. However, little is known about how to effectively leverage these hints while ensuring autonomous and meaningful learning. One promising approach involves pairing AI-generated hints with reflection prompts, asking students to review and analyze their learning, when they request hints. This study investigates the interplay between AI-generated hints and different designs of reflection prompts in an online introductory programming course. We conducted a two-trial field experiment. In Trial 1, students were randomly assigned to receive prompts either before or after receiving hints, or no prompt at all. Each prompt also targeted one of three SRL phases: planning, monitoring, and evaluation. In Trial 2, we examined two types of prompt guidance: directed (offering more explicit and structured guidance) and open (offering more general and less constrained guidance). Findings show that students in the before-hint (RQ1), planning (RQ2), and directed (RQ3) prompt groups produced higher-quality reflections but reported lower satisfaction with AI-generated hints than those in other conditions. Immediate performance did not differ across conditions. This negative relationship between reflection quality and hint satisfaction aligns with previous work on student mental effort and satisfaction. Our results highlight the need to reconsider how AI models are trained and evaluated for education, as prioritizing user satisfaction can undermine deeper learning.