You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
07/05/2024 06:53:35 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit with bitsandbytes.
[INFO|quantizer_bnb_4bit.py:244] 2024-07-05 06:53:35,781 >> The device_map was not initialized. Setting device_map to {'':torch.cuda.current_device()}. If you want to use the model for inference, please set device_map ='auto'
[INFO|modeling_utils.py:3471] 2024-07-05 06:53:35,782 >> loading weights file /home/models_dir/glm-4-9b/model.safetensors.index.json
[INFO|modeling_utils.py:1519] 2024-07-05 06:53:35,782 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-07-05 06:53:35,784 >> Generate config GenerationConfig {
"eos_token_id": [
151329,
151336,
151338
],
"pad_token_id": 151329
}
Loading checkpoint shards: 0%| | 0/10 [00:00<?, ?it/s]/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.mlp.dense_4h_to_h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.mlp.dense_4h_to_h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 100%|████████████████| 10/10 [00:02<00:00, 3.63it/s]
[INFO|modeling_utils.py:4280] 2024-07-05 06:53:42,079 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
[INFO|modeling_utils.py:4288] 2024-07-05 06:53:42,080 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /home/models_dir/glm-4-9b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3797] 2024-07-05 06:53:42,084 >> Generation config file not found, using a generation config created from the model config.
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
07/05/2024 06:53:42 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/05/2024 06:53:42 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.misc - Found linear modules: dense_h_to_4h,dense,query_key_value,dense_4h_to_h
07/05/2024 06:53:42 - INFO - llamafactory.model.loader - trainable params: 21,176,320 || all params: 33,894,891,520 || trainable%: 0.0625
[INFO|deepspeed.py:328] 2024-07-05 06:53:42,880 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.567744731903076 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000050, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1
[2024-07-05 06:53:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown
Traceback (most recent call last):
File "/home/LLaMA-Factory-latest/src/train.py", line 28, in
main()
File "/home/LLaMA-Factory-latest/src/train.py", line 19, in main
run_exp()
File "/home/LLaMA-Factory-latest/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/LLaMA-Factory-latest/src/llamafactory/train/sft/workflow.py", line 90, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2042, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in init
self._configure_distributed_model(model)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model
self.module.to(self.device)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 6 more times]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 324, in to
return self._quantize(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 288, in _quantize
w = self.data.contiguous().cuda(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
[2024-07-05 06:53:51,461] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 758460
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.8.3.dev0Reproduction
!NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 deepspeed --include="localhost:0" src/train.py
--deepspeed /home/LLaMA-Factory-latest/examples/deepspeed/ds_z3_offload_config.json
--stage sft
--do_train True
--model_name_or_path /home/models_dir/glm-4-9b
--preprocessing_num_workers 4
--finetuning_type lora
--template glm4
--flash_attn auto
--dataset identity
--dataset_dir data
--cutoff_len 1024
--learning_rate 5e-05
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 10
--save_steps 1
--eval_steps 1
--val_size 0.1
--evaluation_strategy epoch
--save_strategy epoch
--save_total_limit 1
--optim adamw_torch
--packing True
--upcast_layernorm False
--output_dir /home/LLaMA-Factory-latest/save/lora/sft/train_2407051411
--ddp_timeout 180000000
--overwrite_cache
--overwrite_output_dir
--quantization_bit 4
--low_cpu_mem_usage False
Expected behavior
I have try #3062 #1130 , and I want to know how to solve the problem. Thanks!
Others
[2024-07-05 06:53:04,664] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-05 06:53:05,218] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-07-05 06:53:05,229] [INFO] [runner.py:568:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None src/train.py --deepspeed /home/LLaMA-Factory-latest/examples/deepspeed/ds_z3_offload_config.json --stage sft --do_train True --model_name_or_path /home/models_dir/glm-4-9b --preprocessing_num_workers 4 --finetuning_type lora --template glm4 --flash_attn auto --dataset identity --dataset_dir data --cutoff_len 1024 --learning_rate 5e-05 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1 --eval_steps 1 --val_size 0.1 --evaluation_strategy epoch --save_strategy epoch --save_total_limit 1 --optim adamw_torch --packing True --upcast_layernorm False --output_dir /home/LLaMA-Factory-latest/save/lora/sft/train_2407051411 --ddp_timeout 180000000 --overwrite_cache --overwrite_output_dir --quantization_bit 4 --low_cpu_mem_usage False
[2024-07-05 06:53:08,836] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NCCL_P2P_DISABLE=1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.17.1-1+cuda12.1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.17.1-1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.17.1-1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.17.1-1+cuda12.1
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-07-05 06:53:09,417] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.17.1-1
[2024-07-05 06:53:09,417] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2024-07-05 06:53:09,417] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-07-05 06:53:09,417] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-07-05 06:53:09,417] [INFO] [launch.py:163:main] dist_world_size=1
[2024-07-05 06:53:09,417] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-07-05 06:53:09,418] [INFO] [launch.py:253:main] process 758460 spawned with command: ['/opt/conda/bin/python', '-u', 'src/train.py', '--local_rank=0', '--deepspeed', '/home/LLaMA-Factory-latest/examples/deepspeed/ds_z3_offload_config.json', '--stage', 'sft', '--do_train', 'True', '--model_name_or_path', '/home/models_dir/glm-4-9b', '--preprocessing_num_workers', '4', '--finetuning_type', 'lora', '--template', 'glm4', '--flash_attn', 'auto', '--dataset', 'identity', '--dataset_dir', 'data', '--cutoff_len', '1024', '--learning_rate', '5e-05', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--max_grad_norm', '1.0', '--logging_steps', '10', '--save_steps', '1', '--eval_steps', '1', '--val_size', '0.1', '--evaluation_strategy', 'epoch', '--save_strategy', 'epoch', '--save_total_limit', '1', '--optim', 'adamw_torch', '--packing', 'True', '--upcast_layernorm', 'False', '--output_dir', '/home/LLaMA-Factory-latest/save/lora/sft/train_2407051411', '--ddp_timeout', '180000000', '--overwrite_cache', '--overwrite_output_dir', '--quantization_bit', '4', '--low_cpu_mem_usage', 'False']
[2024-07-05 06:53:15,370] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1474: FutureWarning:
evaluation_strategy
is deprecated and will be removed in version 4.46 of 🤗 Transformers. Useeval_strategy
insteadwarnings.warn(
[2024-07-05 06:53:19,402] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-05 06:53:19,403] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
07/05/2024 06:53:19 - WARNING - llamafactory.hparams.parser - We recommend enable
upcast_layernorm
in quantized training.07/05/2024 06:53:19 - WARNING - llamafactory.hparams.parser - We recommend enable mixed precision training.
07/05/2024 06:53:19 - WARNING - llamafactory.hparams.parser -
ddp_find_unused_parameters
needs to be set as False for LoRA in DDP training.07/05/2024 06:53:19 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: None
[INFO|tokenization_utils_base.py:2106] 2024-07-05 06:53:19,507 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2106] 2024-07-05 06:53:19,507 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2106] 2024-07-05 06:53:19,507 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2106] 2024-07-05 06:53:19,507 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2106] 2024-07-05 06:53:19,507 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-07-05 06:53:20,366 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
07/05/2024 06:53:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
07/05/2024 06:53:20 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=4): 100%|█| 91/91 [00:00<00:00, 487.86 ex
Running tokenizer on dataset (num_proc=4): 100%|█| 91/91 [00:10<00:00, 8.34 exa
[INFO|configuration_utils.py:731] 2024-07-05 06:53:35,706 >> loading configuration file /home/models_dir/glm-4-9b/config.json
[INFO|configuration_utils.py:731] 2024-07-05 06:53:35,709 >> loading configuration file /home/models_dir/glm-4-9b/config.json
[INFO|configuration_utils.py:796] 2024-07-05 06:53:35,711 >> Model config ChatGLMConfig {
"_name_or_path": "/home/models_dir/glm-4-9b",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": [
151329,
151336,
151338
],
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1.5625e-07,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 40,
"original_rope": true,
"pad_token_id": 151329,
"padded_vocab_size": 151552,
"post_layer_norm": true,
"rmsnorm": true,
"rope_ratio": 1,
"seq_length": 8192,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.2",
"use_cache": true,
"vocab_size": 151552
}
07/05/2024 06:53:35 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit with bitsandbytes.
[INFO|quantizer_bnb_4bit.py:244] 2024-07-05 06:53:35,781 >> The device_map was not initialized. Setting device_map to {'':torch.cuda.current_device()}. If you want to use the model for inference, please set device_map ='auto'
[INFO|modeling_utils.py:3471] 2024-07-05 06:53:35,782 >> loading weights file /home/models_dir/glm-4-9b/model.safetensors.index.json
[INFO|modeling_utils.py:1519] 2024-07-05 06:53:35,782 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-07-05 06:53:35,784 >> Generate config GenerationConfig {
"eos_token_id": [
151329,
151336,
151338
],
"pad_token_id": 151329
}
Loading checkpoint shards: 0%| | 0/10 [00:00<?, ?it/s]/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.0.mlp.dense_4h_to_h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.1.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.self_attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.mlp.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for transformer.encoder.layers.39.mlp.dense_4h_to_h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass
assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 100%|████████████████| 10/10 [00:02<00:00, 3.63it/s]
[INFO|modeling_utils.py:4280] 2024-07-05 06:53:42,079 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
[INFO|modeling_utils.py:4288] 2024-07-05 06:53:42,080 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /home/models_dir/glm-4-9b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3797] 2024-07-05 06:53:42,084 >> Generation config file not found, using a generation config created from the model config.
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
07/05/2024 06:53:42 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/05/2024 06:53:42 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
07/05/2024 06:53:42 - INFO - llamafactory.model.model_utils.misc - Found linear modules: dense_h_to_4h,dense,query_key_value,dense_4h_to_h
07/05/2024 06:53:42 - INFO - llamafactory.model.loader - trainable params: 21,176,320 || all params: 33,894,891,520 || trainable%: 0.0625
[INFO|deepspeed.py:328] 2024-07-05 06:53:42,880 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.567744731903076 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000050, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1
[2024-07-05 06:53:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown
Traceback (most recent call last):
File "/home/LLaMA-Factory-latest/src/train.py", line 28, in
main()
File "/home/LLaMA-Factory-latest/src/train.py", line 19, in main
run_exp()
File "/home/LLaMA-Factory-latest/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/LLaMA-Factory-latest/src/llamafactory/train/sft/workflow.py", line 90, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2042, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in init
self._configure_distributed_model(model)
File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model
self.module.to(self.device)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 6 more times]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 324, in to
return self._quantize(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 288, in _quantize
w = self.data.contiguous().cuda(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
[2024-07-05 06:53:51,461] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 758460
The text was updated successfully, but these errors were encountered: