A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)
Latest commits.
Builders behind this project.