You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was going over your dataset code and I noticed that you're sampling from the episode buffer randomly. Generally this is correct because subsequent episodes will be strongly correlated, but your sampling technique picks a random episode at each step rather than guaranteeing every episode is seen before any episode is seen twice.
It's probably not a big deal since you'll sample uniformly on average, but I was wondering if you had a reason to make this implementation choice?
Thanks again for writing this repo.
The text was updated successfully, but these errors were encountered:
In the context of off-policy reinforcement learning, it's a common practice to stochastically sample steps from the replay buffer. As a reference, implementation in original DreamerV3 uses a similar approach by randomly selecting "chunks" of successive 1024 steps and then sampling sequences from those chunks.
In my repository, I save data on an episode-by-episode basis within the replay buffer. This choice was made to facilitate handling of individual episode data, making it easier to work with.
I hope this clarifies the implementation choice. If you have any further questions, please feel free to share them.
Hi,
I was going over your dataset code and I noticed that you're sampling from the episode buffer randomly. Generally this is correct because subsequent episodes will be strongly correlated, but your sampling technique picks a random episode at each step rather than guaranteeing every episode is seen before any episode is seen twice.
It's probably not a big deal since you'll sample uniformly on average, but I was wondering if you had a reason to make this implementation choice?
Thanks again for writing this repo.
The text was updated successfully, but these errors were encountered: