Code In Place 2025 AI Evaluation Project

Welcome to the project repository for the Code In Place AI Evaluation Project (CIP-AIEP).
This project led to the paper:

Aligning Small Language Models for Programming Feedback:
Towards Scalable Coding Support in a Massive Global Course

To appear in the proceedings of SIGCSE TS 2026, St. Louis, Missouri.

Summary

In this project, we trained a 3B-parameter small language model (SLM) to provide diagnostic feedback on students’ submissions to exam-like programming exxercises.
The model was deployed within Code In Place, a Massive Open Online Course (MOOC) that teaches thousands of learners worldwide the fundamentals of Python programming.
The model was guided by rubric-based prompting, and trained with a combination of supervised fine-tuning, and preference-based optimization.
Feedback quality was judged by over 50 teaching assistants.

Highlight

✅ The trained SLM closed the gap to GPT-4.1 from 80% → 10% on correctness and helpfulness criteria.

Trained 3B SLMs approach GPT-4.1 on correctness and helpfulness, while being locally deployable.

Impact

This study is one of the first deployment of trained SLMs for diagnostic programming feedback in a global MOOC.
It demonstrates that small, open-source models can provide timely, constructive, and scalable feedback.

Contributors

We thank all the teaching assistants who evaluated feedback quality. The list of full contributors can be found on the contributors page. Special thanks also to the Code In Place team for supporting the authors during the development of this project.

Authors:

Charles Koutcheme (Aalto University)
Juliette Woodrow (Stanford University)
Chris Piech (Stanford University)