NVIDIA Research Scientist
Pinned Loading
-
-
RLHFlow/RLHF-Reward-Modeling
RLHFlow/RLHF-Reward-Modeling PublicRecipes to train reward model for RLHF.
-
AI-secure/multi-task-learning
AI-secure/multi-task-learning PublicCode for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.
-
RLHFlow/Directional-Preference-Alignment
RLHFlow/Directional-Preference-Alignment PublicDirectional Preference Alignment
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.