Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

niravlg · 2024-06-20T18:01:37Z

Reminder

I have read the README and searched the existing issues.

System Info

I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.

In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.

Check out the way infer_max_len is used. The definition of infer_max_len is here.

However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.

Reproduction

I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):

{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2543, 'chosen_labels_length': 2048, 'rejected_labels_length': 2543}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2487, 'chosen_labels_length': 2048, 'rejected_labels_length': 2487}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2400, 'chosen_labels_length': 2048, 'rejected_labels_length': 2400}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2334, 'chosen_labels_length': 2048, 'rejected_labels_length': 2334}

Expected behavior

This has two major issues:

The response length of the rejected responses may exceed the cutoff limit, resulting in Out of Memory (OOM) errors in the middle of the runs.
This is inconsistent with how HuggingFace's DPOTrainer is implemented.

In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.

Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?

Others

No response

The text was updated successfully, but these errors were encountered:

FangLi1 · 2024-06-21T02:27:21Z

follow+1

hiyouga · 2024-06-30T17:26:12Z

Thank you for raising this issue, we have fixed it in the latest version.

Deprecate reserved_label_len arg

github-actions bot added the pending This problem is yet to be addressed label Jun 20, 2024

niravlg changed the title ~~Cutoff Length only followed for chosen response in LLAMA-Factory DPO~~ Cutoff Length only followed for chosen response in Pairwise Data for DPO Jun 20, 2024

niravlg mentioned this issue Jun 30, 2024

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

Closed

hiyouga closed this as completed in 1771251 Jun 30, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 30, 2024

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024

fix hiyouga#4402 hiyouga#4617

2237fdf

Deprecate reserved_label_len arg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Comments

Reminder

System Info

Reproduction

Expected behavior

Others