Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

mhe500 · 2020-03-06T21:43:43Z

This allows environments to have non-scalar rewards.
Relates to #239.

This allows environments to have non-scalar rewards. Relates to tensorflow#239.

ebrevdo · 2020-03-17T18:22:09Z

tf_agents/environments/tf_py_environment.py

@@ -333,7 +336,7 @@ def _set_names_and_shapes(self, step_type, reward, discount,
    batch_shape = tf.TensorShape(batch_shape)
    if not tf.executing_eagerly():
      # Shapes are not required in eager mode.
-      reward.set_shape(batch_shape)
+      reward.set_shape(batch_shape.concatenate(self._time_step_shapes[1]))


Why [1]? And why calculate all flattened shapes if you only use one of them?

Hi @ebrevdo. Thanks for the feedback.

I agree, it is not as clear as it should be and not necessary to cache all shapes.. I've updated to not cache shapes in __init__ and to explicitly query self.time_step_spec().reward.shape rather than referring to it by its index position after flattening ([1]).

Set reward shape per wrapped env's time_step_spec.

222c80d

This allows environments to have non-scalar rewards. Relates to tensorflow#239.

googlebot added the cla: yes label Mar 6, 2020

tfboyd added the kokoro:run label Mar 17, 2020

kokoro-team removed the kokoro:run label Mar 17, 2020

tfboyd added kokoro:force-run kokoro:run labels Mar 17, 2020

kokoro-team removed kokoro:run kokoro:force-run labels Mar 17, 2020

ebrevdo suggested changes Mar 17, 2020

View reviewed changes

Don't unnecessarily save TimeStep spec's shapes.

b47b6e8

ebrevdo approved these changes Mar 18, 2020

View reviewed changes

tf-agents-copybara merged commit 5503c20 into tensorflow:master Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

Set TFPyEnv's reward shape per wrapped env's time_step_spec. #325

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment