[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

Merged

Conversation

mhe500
Copy link
Contributor
@mhe500 mhe500 commented Mar 6, 2020

This allows environments to have non-scalar rewards.
Relates to #239.

This allows environments to have non-scalar rewards.
Relates to tensorflow#239.
@@ -333,7 +336,7 @@ def _set_names_and_shapes(self, step_type, reward, discount,
batch_shape = tf.TensorShape(batch_shape)
if not tf.executing_eagerly():
# Shapes are not required in eager mode.
reward.set_shape(batch_shape)
reward.set_shape(batch_shape.concatenate(self._time_step_shapes[1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [1]? And why calculate all flattened shapes if you only use one of them?

Copy link
Contributor Author
@mhe500 mhe500 Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ebrevdo. Thanks for the feedback.

I agree, it is not as clear as it should be and not necessary to cache all shapes.. I've updated to not cache shapes in __init__ and to explicitly query self.time_step_spec().reward.shape rather than referring to it by its index position after flattening ([1]).

@tf-agents-copybara tf-agents-copybara merged commit 5503c20 into tensorflow:master Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants