Beyond Scalar Rewards: Integrating Group-Level Natural Language Feedback in RL Pipelines
By integrating group-level natural language feedback as off-policy scaffolds, engineers can achieve a 2.2x improvement in sample efficiency compared to traditional scalar-only reward RLHF pipelines.
Read article →