GLOSSARY TERM
What is Reinforcement Learning from Human Feedback?
A technique that aligns model behavior using human evaluations as reward signals.
RLHF trains a reward model based on human preference rankings. This proxy model is then used to optimize the primary generative agent via reinforcement learning algorithms, forcefully aligning its outputs with complex human values.
Align AI Models
Deploy finely aligned, secure models tailored strictly for private corporate use.