sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

1737686924 · 2024-06-06T04:53:07Z

Reminder

I have read the README and searched the existing issues.

System Info

ASCEND_RT_VISIBLE_DEVICES=0,1 deepspeed --num_gpus 2 src/train.py
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--stage sft
--do_train true
--model_name_or_path /data/applications/lmd-formal/backend/BaseModels/internlm2-base-7b
--dataset identity,alpaca_en_demo
--template intern2
--finetuning_type freeze
--freeze_trainable_layers 8
--freeze_trainable_modules all
--use_llama_pro true
--output_dir saves/internlm2-base-7b/sft/freeze
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--preprocessing_num_workers 16
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 1e-4
--num_train_epochs 3.0
--val_size 0.001
--ddp_timeout 180000000
--plot_loss
--fp16

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

transformers=0.41.2
torch=2.2.0
torch-npu=2.2.0

Reproduction

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Expected behavior

No response

Others

No response

hiyouga added the pending This problem is yet to be addressed label Jun 6, 2024

hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

sft+freeze训练internlm2-base-7b报错，RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

Comments

Reminder

System Info

Reproduction

Expected behavior

Others