Ascend卡上无法训练deepseek模型是否支持呢 #4361

sweetning0809 · 2024-06-18T12:44:58Z

在npu上训练deepseek系列模型，需要flash attn库但是因为冲突 npu无法使用该库导致无法训练
请问是否考虑支持呢可能需要把向前推理换成flashattn算子：https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html

llamafactory cli train

希望可以支持npu训练deepseek

No response

sweetning0809 · 2024-06-19T03:27:09Z

研究了一下代码主要涉及的部分是longlora.py中的LlamaFlashAttention2和LlamaSdpaAttention 可能需要按照https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html 将transformer中的LlamaFlashAttention2文件第516和531行做替换为文档中的torch_npu.npu_fusion_attention

Sdpa同理可能可以支持一下

sweetning0809 · 2024-06-19T09:03:53Z

同时需要更改模型中的modeling_deepseek.py

sweetning0809 · 2024-06-19T09:51:48Z

github-actions bot added the pending This problem is yet to be addressed label Jun 18, 2024

hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024

Provide feedback