Directive, Metacognitive or a Blend of Both? A Comparison of AI-Generated Feedback Types on Student Engagement, Confidence, and Outcomes

Authors: Omar Alsaiari, Nilufar Baghaei, Jason M. Lodge, Omid Noroozi, Dragan Gašević, Marie Boden, Hassan Khosravi

Abstract: Feedback is one of the most powerful influences on student learning, with extensive research examining how best to implement it in educational settings. Increasingly, feedback is being generated by artificial intelligence (AI), offering scalable and adaptive responses. Two widely studied approaches are directive feedback, which gives explicit explanations and reduces cognitive load to speed up learning, and metacognitive feedback which prompts learners to reflect, track their progress, and develop self-regulated learning (SRL) skills. While both approaches have clear theoretical advantages, their comparative effects on engagement, confidence, and quality of work remain underexplored. This study presents a semester-long randomised controlled trial with 329 students in an introductory design and programming course using an adaptive educational platform. Participants were assigned to receive directive, metacognitive, or hybrid AI-generated feedback that blended elements of both directive and metacognitive feedback. Results showed that revision behaviour differed across feedback conditions, with Hybrid prompting the most revisions compared to Directive and Metacognitive. Confidence ratings were uniformly high, and resource quality outcomes were comparable across conditions. These findings highlight the promise of AI in delivering feedback that balances clarity with reflection. Hybrid approaches, in particular, show potential to combine actionable guidance for immediate improvement with opportunities for self-reflection and metacognitive growth.

Link: https://arxiv.org/abs/2510.19685

LLMBridge: Reducing Costs to Access LLMs in a Prompt-Centric Internet

Authors: Noah Martin, Abdullah Bin Faisal, Hiba Eltigani, Rukhshan Haroon, Swaminathan Lamelas, Fahad Dogar

Abstract: Today’s Internet infrastructure is centered around content retrieval over HTTP, with middleboxes (e.g., HTTP proxies) playing a crucial role in performance, security, and cost-effectiveness. We envision a future where Internet communication will be dominated by “prompts” sent to generative AI models. For this, we will need proxies that provide similar functions to HTTP proxies (e.g., caching, routing, compression) while dealing with unique challenges and opportunities of prompt-based communication. As a first step toward supporting prompt-based communication, we present LLMBridge, an LLM proxy designed for cost-conscious users, such as those in developing regions and education (e.g., students, instructors). LLMBridge supports three key optimizations: model selection (routing prompts to the most suitable model), context management (intelligently reducing the amount of context), and semantic caching (serving prompts using local models and vector databases). These optimizations introduce trade-offs between cost and quality, which applications navigate through a high-level, bidirectional interface. As case studies, we deploy LLMBridge in two cost-sensitive settings: a WhatsApp-based Q&A service and a university classroom environment. The WhatsApp service has been live for over twelve months, serving 100+ users and handling more than 14.7K requests. In parallel, we exposed LLMBridge to students across three computer science courses over a semester, where it supported diverse LLM-powered applications – such as reasoning agents and chatbots – and handled an average of 500 requests per day. We report on deployment experiences across both settings and use the collected workloads to benchmark the effectiveness of various cost-optimization strategies, analyzing their trade-offs in cost, latency, and response quality.

Link: https://arxiv.org/abs/2410.11857

Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

Authors: Bo Yuan, Jiazi Hu

Abstract: While Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning, systematic head-to-head evaluations in authentic learning scenarios remain scarce. This study presents an empirical comparison of three state-of-the-art LLMs on a tutoring task simulating a realistic learning setting. Using a dataset containing a student’s responses to ten mixed-format questions with correctness labels, each model was asked to (i) analyze the quiz to identify underlying knowledge components, (ii) infer the student’s mastery profile, and (iii) generate targeted guidance for improvement. To mitigate subjectivity and evaluator bias, Gemini was employed as a virtual judge to perform pairwise comparisons across multiple dimensions: accuracy, clarity, actionability, and appropriateness. Results analyzed via the Bradley-Terry model reveal that GPT-4o is generally preferred, producing feedback that is more informative and better structured than its counterparts, whereas DeepSeek-V3 and GLM-4.5 demonstrate intermittent strengths but lower consistency. These findings highlight the feasibility of deploying LLMs as advanced teaching assistants for individualized support and provide methodological insights for subsequent empirical research on LLM-driven personalized learning.

Link: https://arxiv.org/abs/2509.05346

Discovering the curriculum with AI: A proof-of-concept demonstration with an intelligent tutoring system for teaching project selection

Authors: Lovis Heindrich, Falk Lieder

Abstract: The decisions of individuals and organizations are often suboptimal because fully rational decision-making is too demanding in the real world. Recent work suggests that some errors can be prevented by leveraging artificial intelligence to discover and teach clever heuristics. So far, this line of research has been limited to simplified, artificial decision-making tasks. This article is the first to extend this approach to a real-world decision problem, namely, executives deciding which project their organization should launch next. We develop a computational method (MGPS) that automatically discovers project selection strategies that are optimized for real people, and we develop an intelligent tutor that teaches the discovered project selection procedures. We evaluated MGPS on a computational benchmark and tested the intelligent tutor in a training experiment with two control conditions. MGPS outperformed a state-of-the-art method and was more computationally efficient. Moreover, people who practiced with our intelligent tutor learned significantly better project selection strategies than the control groups. These findings suggest that AI could be used to automate the process of discovering and formalizing the cognitive strategies taught by intelligent tutoring systems.

Link: https://arxiv.org/abs/2406.04082

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Authors: Yoshinari Fujinuma

Abstract: Large Language Models (LLMs) are commonly used as evaluators in various applications, but the reliability of the outcomes remains a challenge. One such challenge is using LLMs-as-judges for direct assessment, i.e., assigning scores from a specified range without any references. We first show that this challenge stems from LLM judge outputs being associated with score range bias, i.e., LLM judge outputs are highly sensitive to pre-defined score ranges, preventing the search for optimal score ranges. We also show that similar biases exist among models from the same family. We then mitigate this bias through contrastive decoding, achieving up to 11.3% relative improvement on average in Spearman correlation with human judgments across different score ranges.

Link: https://arxiv.org/abs/2510.18196

css.php