researchcompletedAug 2025 – Oct 2025

STaR: Self-Taught Reasoner

Fine-tuned Llama-3.2-3B achieving 46.2% on GSM8K (+65% over zero-shot CoT)

Role: ResearcherDuration: 3 months

Overview

Can you teach a model to reason better by having it practice reasoning? The STaR approach says yes. I fine-tuned Llama-3.2-3B with iterative self-generated rationales on A100 GPUs, generating 3.5K synthetic training samples. The 46.2% on GSM8K outperformed vanilla SFT (39.2%) by 7 absolute points — proving models can bootstrap their own reasoning abilities.

The Problem

Language models often struggle with multi-step reasoning tasks, particularly mathematical problem-solving. While scaling model size improves performance, a more efficient approach is teaching models to reason more carefully through self-improvement — learning from their own correct reasoning chains.

The Approach

I implemented the Self-Taught Reasoner (STaR) methodology: the model generates rationales for training problems, filters for rationales that lead to correct answers, and then fine-tunes on those successful reasoning chains. This iterative process allows the model to bootstrap its own reasoning abilities.

Built with PyTorch, Hugging Face Transformers, and parameter-efficient fine-tuning (PEFT/LoRA) to make the iterative training feasible. The training loop involved multiple rounds of rationale generation, filtering, and fine-tuning.

Results

The fine-tuned model showed significant improvement on the GSM8K mathematical reasoning benchmark compared to the base model. The iterative self-improvement approach proved effective at teaching more structured reasoning without requiring human-authored rationales.

Technology Stack

PythonPyTorchHugging FaceLlama-3.2-3BNVIDIA A100