[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Eisenhower
Copy link

Fixed the bug that prevents configuring datasets using train-data-path, valid-data-path, and test-data-path.

When the --split parameter is not configured, the --split parameter will be set to the default value 969, 30, 1. In the blended_megatron_dataset_config.py file, within the post_init function, the following code will raise an error when configuring datasets using train-data-path, valid-data-path, and test-data-path because the split parameter is not None:

if self.blend_per_split is not None and any(self.blend_per_split):
assert self.blend is None, "blend and blend_per_split are incompatible"
assert self.split is None, "split and blend_per_split are incompatible"

@Eisenhower Eisenhower changed the title Fix the bug that prevents configuring datasets using train-data-path,… Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant