Jupyter Notebook

build-llm-rl-intuition

An intuition ladder from supervised learning to LLM RL post-training (SFT, REINFORCE, PPO, GRPO, reward models, DPO) — one tiny runnable notebook.

Jupyter NotebookEmerging

Stars

—

Forks

—

Contributors

Last push

11d ago

Recent commits

Latest commits.

Builders behind this project.