LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Other

understanding-rlhf

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.

OtherEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
1
Last push
26mo ago

Recent commits

Latest commits.

  • Update README.md
    9d5844eAnikait Singh26mo ago
  • license
    2c0823fAnikait Singh26mo ago
  • llm experiments
    1a24ca8Anikait Singh26mo ago
  • bandit experiments
    6af8169Anikait Singh26mo ago

Top contributors

Builders behind this project.

Asap7772
4 commits