A very simple GRPO implement for reproducing r1-like LLM thinking.
Latest commits.
Builders behind this project.