Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Eisenhower · 2024-05-27T11:42:19Z

Fixed the bug that prevents configuring datasets using train-data-path, valid-data-path, and test-data-path.

When the --split parameter is not configured, the --split parameter will be set to the default value 969, 30, 1. In the blended_megatron_dataset_config.py file, within the post_init function, the following code will raise an error when configuring datasets using train-data-path, valid-data-path, and test-data-path because the split parameter is not None:

if self.blend_per_split is not None and any(self.blend_per_split):
assert self.blend is None, "blend and blend_per_split are incompatible"
assert self.split is None, "split and blend_per_split are incompatible"

… valid-data-path, and test-data-path.

Fix the bug that prevents configuring datasets using train-data-path,…

298bdff

… valid-data-path, and test-data-path.

Eisenhower changed the title ~~Fix the bug that prevents configuring datasets using train-data-path,…~~ Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path May 27, 2024

Eisenhower mentioned this pull request May 27, 2024

Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors #841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Are you sure you want to change the base?

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path #840

Conversation