Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

Authors: Bo Yuan, Jiazi Hu

Abstract: While Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning, systematic head-to-head evaluations in authentic learning scenarios remain scarce. This study presents an empirical comparison of three state-of-the-art LLMs on a tutoring task simulating a realistic learning setting. Using a dataset containing a student’s responses to ten mixed-format questions with correctness labels, each model was asked to (i) analyze the quiz to identify underlying knowledge components, (ii) infer the student’s mastery profile, and (iii) generate targeted guidance for improvement. To mitigate subjectivity and evaluator bias, Gemini was employed as a virtual judge to perform pairwise comparisons across multiple dimensions: accuracy, clarity, actionability, and appropriateness. Results analyzed via the Bradley-Terry model reveal that GPT-4o is generally preferred, producing feedback that is more informative and better structured than its counterparts, whereas DeepSeek-V3 and GLM-4.5 demonstrate intermittent strengths but lower consistency. These findings highlight the feasibility of deploying LLMs as advanced teaching assistants for individualized support and provide methodological insights for subsequent empirical research on LLM-driven personalized learning.

Link: https://arxiv.org/abs/2509.05346

LLMBridge: Reducing Costs to Access LLMs in a Prompt-Centric Internet

Authors: Noah Martin, Abdullah Bin Faisal, Hiba Eltigani, Rukhshan Haroon, Swaminathan Lamelas, Fahad Dogar

Abstract: Today’s Internet infrastructure is centered around content retrieval over HTTP, with middleboxes (e.g., HTTP proxies) playing a crucial role in performance, security, and cost-effectiveness. We envision a future where Internet communication will be dominated by “prompts” sent to generative AI models. For this, we will need proxies that provide similar functions to HTTP proxies (e.g., caching, routing, compression) while dealing with unique challenges and opportunities of prompt-based communication. As a first step toward supporting prompt-based communication, we present LLMBridge, an LLM proxy designed for cost-conscious users, such as those in developing regions and education (e.g., students, instructors). LLMBridge supports three key optimizations: model selection (routing prompts to the most suitable model), context management (intelligently reducing the amount of context), and semantic caching (serving prompts using local models and vector databases). These optimizations introduce trade-offs between cost and quality, which applications navigate through a high-level, bidirectional interface. As case studies, we deploy LLMBridge in two cost-sensitive settings: a WhatsApp-based Q&A service and a university classroom environment. The WhatsApp service has been live for over twelve months, serving 100+ users and handling more than 14.7K requests. In parallel, we exposed LLMBridge to students across three computer science courses over a semester, where it supported diverse LLM-powered applications – such as reasoning agents and chatbots – and handled an average of 500 requests per day. We report on deployment experiences across both settings and use the collected workloads to benchmark the effectiveness of various cost-optimization strategies, analyzing their trade-offs in cost, latency, and response quality.

Link: https://arxiv.org/abs/2410.11857

Designing Knowledge Tools: How Students Transition from Using to Creating Generative AI in STEAM classroom

Authors: Qian Huang, Nachamma Sockalingam, Thijs Willems, King Wang Poon

Abstract: This study explores how graduate students in an urban planning program transitioned from passive users of generative AI to active creators of custom GPT-based knowledge tools. Drawing on Self-Determination Theory (SDT), which emphasizes the psychological needs of autonomy, competence, and relatedness as foundations for intrinsic motivation, the research investigates how the act of designing AI tools influences students’ learning experiences, identity formation, and engagement with knowledge. The study is situated within a two-term curriculum, where students first used instructor-created GPTs to support qualitative research tasks and later redesigned these tools to create their own custom applications, including the Interview Companion GPT. Using qualitative thematic analysis of student slide presentations and focus group interviews, the findings highlight a marked transformation in students’ roles and mindsets. Students reported feeling more autonomous as they chose the functionality, design, and purpose of their tools, more competent through the acquisition of AI-related skills such as prompt engineering and iterative testing, and more connected to peers through team collaboration and a shared sense of purpose. The study contributes to a growing body of evidence that student agency can be powerfully activated when learners are invited to co-design the very technologies they use. The shift from AI tool users to AI tool designers reconfigures students’ relationships with technology and knowledge, transforming them from consumers into co-creators in an evolving educational landscape.

Link: https://arxiv.org/abs/2510.19405

Directive, Metacognitive or a Blend of Both? A Comparison of AI-Generated Feedback Types on Student Engagement, Confidence, and Outcomes

Authors: Omar Alsaiari, Nilufar Baghaei, Jason M. Lodge, Omid Noroozi, Dragan Gašević, Marie Boden, Hassan Khosravi

Abstract: Feedback is one of the most powerful influences on student learning, with extensive research examining how best to implement it in educational settings. Increasingly, feedback is being generated by artificial intelligence (AI), offering scalable and adaptive responses. Two widely studied approaches are directive feedback, which gives explicit explanations and reduces cognitive load to speed up learning, and metacognitive feedback which prompts learners to reflect, track their progress, and develop self-regulated learning (SRL) skills. While both approaches have clear theoretical advantages, their comparative effects on engagement, confidence, and quality of work remain underexplored. This study presents a semester-long randomised controlled trial with 329 students in an introductory design and programming course using an adaptive educational platform. Participants were assigned to receive directive, metacognitive, or hybrid AI-generated feedback that blended elements of both directive and metacognitive feedback. Results showed that revision behaviour differed across feedback conditions, with Hybrid prompting the most revisions compared to Directive and Metacognitive. Confidence ratings were uniformly high, and resource quality outcomes were comparable across conditions. These findings highlight the promise of AI in delivering feedback that balances clarity with reflection. Hybrid approaches, in particular, show potential to combine actionable guidance for immediate improvement with opportunities for self-reflection and metacognitive growth.

Link: https://arxiv.org/abs/2510.19685

Discovering the curriculum with AI: A proof-of-concept demonstration with an intelligent tutoring system for teaching project selection

Authors: Lovis Heindrich, Falk Lieder

Abstract: The decisions of individuals and organizations are often suboptimal because fully rational decision-making is too demanding in the real world. Recent work suggests that some errors can be prevented by leveraging artificial intelligence to discover and teach clever heuristics. So far, this line of research has been limited to simplified, artificial decision-making tasks. This article is the first to extend this approach to a real-world decision problem, namely, executives deciding which project their organization should launch next. We develop a computational method (MGPS) that automatically discovers project selection strategies that are optimized for real people, and we develop an intelligent tutor that teaches the discovered project selection procedures. We evaluated MGPS on a computational benchmark and tested the intelligent tutor in a training experiment with two control conditions. MGPS outperformed a state-of-the-art method and was more computationally efficient. Moreover, people who practiced with our intelligent tutor learned significantly better project selection strategies than the control groups. These findings suggest that AI could be used to automate the process of discovering and formalizing the cognitive strategies taught by intelligent tutoring systems.

Link: https://arxiv.org/abs/2406.04082

css.php