Skip to main content
Texas ECE
-
EER 3.646 and via Zoom

Making LLMs useful Teachers

Abhishek Panigrahi
Princeton University

Abstract: Training small language models requires effective distillation, yet existing methods treat teachers as static supervision sources. I argue that effective learning depends on what a model learns and when, principles that extend beyond traditional teacher-student setups.

First, I show that intermediate teacher checkpoints reveal implicit learning curriculums, and that aligning students to these trajectories yields provable sample-complexity benefits. Building on this, I develop GRACES, which predicts teacher–student compatibility from gradients, and STAT, which adapts supervision to a student’s weak skills. I show how these ideas extend beyond distillation to progressive subnetwork training and context-enhanced learning, pointing toward a more general theory of efficient learning. I outline a vision for autonomous systems that can construct their own training curricula.

Speaker Bio: Abhishek is a final year graduate student in the Computer Science department at Princeton University, advised by Prof. Sanjeev Arora. His research focuses on understanding and improving generalization in deep learning models, with an emphasis on principled training algorithms that offer theoretical or interpretable guarantees. He is an Apple AI/ML and Siebel Scholar for the year 2025-26. Prior to the PhD, he was a resident at Microsoft Research India Lab and studied computer science as an undergraduate at IIT Kharagpur.