-
Notifications
You must be signed in to change notification settings - Fork 721
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Supports
max_exploration_probability_hint
in `FalconRewardPredictio…
…nPolicy` This new parameter lets users configure the behavior of the policy by expressing their tolerance on the amount of exploration, which may be more tangible and hence easier to determine than `exploitation_coefficient`. It is an optional float, representing a hint on the maximum exploration probability, internally clipped to [0, 1]. When it is set, `exploitation_coefficient` is ignored and the policy attempts to choose non-greedy actions with at most this probability. When such an upper bound cannot be achieved, e.g. due to insufficient training data, the policy attempts to minimize the probability of choosing non-greedy actions on a best-effort basis. PiperOrigin-RevId: 516896185 Change-Id: I0a3c27ce23dec426d0293d846fdf305b7caa247b
- Loading branch information
TF-Agents Team
authored and
Copybara-Service
committed
Mar 15, 2023
1 parent
dd0c0f8
commit a8cef4c
Showing
4 changed files
with
454 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.