The AI Tutor in Engineering Education: Design, Results, and Redesign of an Experience in Hydrology at an Argentine University

Authors: Hugo Roger Paz

Abstract: The emergence of Generative Artificial Intelligence (GenAI) has reshaped higher education, presenting both opportunities and ethical-pedagogical challenges. This article presents an empirical case study on the complete cycle (design, initial failure, redesign, and re-evaluation) of an intervention using an AI Tutor (ChatGPT) in the “Hydrology and Hydraulic Works” course (Civil Engineering, UTN-FRT, Argentina). The study documents two interventions in the same cohort (n=23). The first resulted in widespread failure (0% pass rate) due to superficial use and serious academic integrity issues (65% similarity, copies > 80%). This failure forced a comprehensive methodological redesign. The second intervention, based on a redesigned prompt (Prompt V2) with strict evidence controls (mandatory Appendix A with exported chat, minimum time $\geq$ 120 minutes, verifiable numerical exercise) and a refined rubric (Rubric V2), showed significantly better results: a median score of 88/100 and verifiable compliance with genuine interaction processes. Using a mixed-methods approach (reproducible document analysis and rubric analysis), the impact of the redesign on integrity and technical performance is evaluated. The results demonstrate that, without explicit process controls, students prioritize efficiency over deep learning, submitting documents without real traceability. A transferable assessment protocol for STEM courses is proposed, centered on “auditable personal zones,” to foster higher-order thinking. The study provides key empirical evidence from the context of a public Latin American university.

Link: https://arxiv.org/abs/2510.22279

Hybrid Instructor Ai Assessment In Academic Projects: Efficiency, Equity, And Methodological Lessons

Authors: Hugo Roger Paz

Abstract: In technical subjects characterized by high enrollment, such as Basic Hydraulics, the assessment of reports necessitates superior levels of objectivity, consistency, and formative feedback; goals often compromised by faculty workload. This study presents the implementation of a generative artificial intelligence (AI) assisted assessment system, supervised by instructors, to grade 33 hydraulics reports. The central objective was to quantify its impact on the efficiency, quality, and fairness of the process. The employed methodology included the calibration of the Large Language Model (LLM) with a detailed rubric, the batch processing of assignments, and a human-in-the-loop validation phase. The quantitative results revealed a noteworthy 88% reduction in grading time (from 50 to 6 minutes per report, including verification) and a 733% increase in productivity. The quality of feedback was substantially improved, evidenced by 100% rubric coverage and a 150% increase in the anchoring of comments to textual evidence. The system proved to be equitable, exhibiting no bias related to report length, and highly reliable post-calibration (r = 0.96 between scores). It is concluded that the hybrid AI-instructor model optimizes the assessment process, thereby liberating time for high-value pedagogical tasks and enhancing the fairness and quality of feedback, in alignment with UNESCO’s principles on the ethical use of AI in education.

Link: https://arxiv.org/abs/2510.22286

Exploring the Applications of Generative AI in High School STEM Education

Authors: Ishaan Masilamony

Abstract: In recent years, ChatGPT \cite{openai_2023_gpt4} along with Microsoft Copilot have become subjects of great discourse, particularly in the field of education. Prior research has hypothesized on potential impacts these tools could have on student learning and performance. These have primarily relied on trends from prior applications of technology in education and an understanding of the limitations and strengths of Generative AI in other applications. This study utilizes an experimental approach to analyze the impacts of Generative AI on high school STEM education (physics in particular). In accordance with most findings, generative AI does have some positive impact on student performance. However, our findings have shown that the most significant impact is an increase in student engagement with the subject.

Link: https://arxiv.org/abs/2510.21718

Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

Authors: Bo Yuan, Jiazi Hu

Abstract: While Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning, systematic head-to-head evaluations in authentic learning scenarios remain scarce. This study presents an empirical comparison of three state-of-the-art LLMs on a tutoring task simulating a realistic learning setting. Using a dataset containing a student’s responses to ten mixed-format questions with correctness labels, each model was asked to (i) analyze the quiz to identify underlying knowledge components, (ii) infer the student’s mastery profile, and (iii) generate targeted guidance for improvement. To mitigate subjectivity and evaluator bias, Gemini was employed as a virtual judge to perform pairwise comparisons across multiple dimensions: accuracy, clarity, actionability, and appropriateness. Results analyzed via the Bradley-Terry model reveal that GPT-4o is generally preferred, producing feedback that is more informative and better structured than its counterparts, whereas DeepSeek-V3 and GLM-4.5 demonstrate intermittent strengths but lower consistency. These findings highlight the feasibility of deploying LLMs as advanced teaching assistants for individualized support and provide methodological insights for subsequent empirical research on LLM-driven personalized learning.

Link: https://arxiv.org/abs/2509.05346

LLMBridge: Reducing Costs to Access LLMs in a Prompt-Centric Internet

Authors: Noah Martin, Abdullah Bin Faisal, Hiba Eltigani, Rukhshan Haroon, Swaminathan Lamelas, Fahad Dogar

Abstract: Today’s Internet infrastructure is centered around content retrieval over HTTP, with middleboxes (e.g., HTTP proxies) playing a crucial role in performance, security, and cost-effectiveness. We envision a future where Internet communication will be dominated by “prompts” sent to generative AI models. For this, we will need proxies that provide similar functions to HTTP proxies (e.g., caching, routing, compression) while dealing with unique challenges and opportunities of prompt-based communication. As a first step toward supporting prompt-based communication, we present LLMBridge, an LLM proxy designed for cost-conscious users, such as those in developing regions and education (e.g., students, instructors). LLMBridge supports three key optimizations: model selection (routing prompts to the most suitable model), context management (intelligently reducing the amount of context), and semantic caching (serving prompts using local models and vector databases). These optimizations introduce trade-offs between cost and quality, which applications navigate through a high-level, bidirectional interface. As case studies, we deploy LLMBridge in two cost-sensitive settings: a WhatsApp-based Q&A service and a university classroom environment. The WhatsApp service has been live for over twelve months, serving 100+ users and handling more than 14.7K requests. In parallel, we exposed LLMBridge to students across three computer science courses over a semester, where it supported diverse LLM-powered applications – such as reasoning agents and chatbots – and handled an average of 500 requests per day. We report on deployment experiences across both settings and use the collected workloads to benchmark the effectiveness of various cost-optimization strategies, analyzing their trade-offs in cost, latency, and response quality.

Link: https://arxiv.org/abs/2410.11857

css.php