[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Closed
1 task done
niravlg opened this issue Jun 20, 2024 · 2 comments
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@niravlg
Copy link
niravlg commented Jun 20, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.

In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.

Check out the way infer_max_len is used. The definition of infer_max_len is here.

However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.

Reproduction

I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):

{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2543, 'chosen_labels_length': 2048, 'rejected_labels_length': 2543}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2487, 'chosen_labels_length': 2048, 'rejected_labels_length': 2487}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2400, 'chosen_labels_length': 2048, 'rejected_labels_length': 2400}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2334, 'chosen_labels_length': 2048, 'rejected_labels_length': 2334}

Expected behavior

This has two major issues:

  1. The response length of the rejected responses may exceed the cutoff limit, resulting in Out of Memory (OOM) errors in the middle of the runs.
  2. This is inconsistent with how HuggingFace's DPOTrainer is implemented.

In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.

Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 20, 2024
@niravlg niravlg changed the title Cutoff Length only followed for chosen response in LLAMA-Factory DPO Cutoff Length only followed for chosen response in Pairwise Data for DPO Jun 20, 2024
@FangLi1
Copy link
FangLi1 commented Jun 21, 2024

follow+1

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 30, 2024
@hiyouga
Copy link
Owner
hiyouga commented Jun 30, 2024

Thank you for raising this issue, we have fixed it in the latest version.

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024
Deprecate reserved_label_len arg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants