GRPO reinforcement-learning fine-tuning, implemented from scratch in numpy (CPU-verified) + a TRL/Qwen GPU path on Modal. CLI-first, MLflow-tracked.
Latest commits.
No recent commits available.
Builders behind this project.
No contributor data available.