-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to convert megatron T5 model to huggingface T5 model?
stale
No activity in 60 days on issue or PR
#178
opened Jan 20, 2022 by
ZikaiGuo
How about supporting alternatives to fine-tuning?
stale
No activity in 60 days on issue or PR
#114
opened Jul 6, 2021 by
hwijeen
[QUESTION] which script to use to launch llama 2 fine tuning?
stale
No activity in 60 days on issue or PR
#528
opened Oct 4, 2023 by
ppt0011
[QUESTION] Seeking documentation/paper for expert and context parallel
stale
No activity in 60 days on issue or PR
#562
opened Oct 26, 2023 by
mayank31398
[BUG] SwitchMLP is not compatible with distributed_optimizer, but there is no assertion.
stale
No activity in 60 days on issue or PR
#565
opened Oct 30, 2023 by
huangyf530
[REGRESSION] MoEs are obtaining higher loss than they should during training
#894
opened Jun 27, 2024 by
kiddyboots216
[QUESTION] Hello, when finetune llama2-7B, it's error: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling No activity in 60 days on issue or PR
cublasCreate(handle)
stale
#549
opened Oct 17, 2023 by
13416157913
--no-query-key-layer-scaling
makes no difference
stale
#279
opened Feb 8, 2023 by
felix-schneider
Batch_input and elapsed time per iteration slow down during model training
#897
opened Jun 29, 2024 by
Yuhanleeee
How to calculate FLOPS?
stale
No activity in 60 days on issue or PR
#76
opened Mar 9, 2021 by
ShivanshuPurohit
why rank 0 comsumes more gpu memory than other ranks within single machine
stale
No activity in 60 days on issue or PR
#78
opened Mar 10, 2021 by
huangjundashuaige
Vocabulary size of released pretrained BERT does not match the provided vocabulary file
stale
No activity in 60 days on issue or PR
#85
opened Mar 16, 2021 by
rhythmswing
loss curve in pretraining BERT is very strange
stale
No activity in 60 days on issue or PR
#86
opened Mar 16, 2021 by
zjujh1995
bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1
stale
No activity in 60 days on issue or PR
#88
opened Mar 23, 2021 by
bugface
AttributeError: module 'torch' has no attribute '_amp_foreach_non_finite_check_and_unscale_'
stale
No activity in 60 days on issue or PR
#90
opened Apr 8, 2021 by
Carolingliang
Can not create embeddings from Megatron
stale
No activity in 60 days on issue or PR
#91
opened Apr 17, 2021 by
Benan-Akca
Do we need to use preprocess.py before loading dataset?
stale
No activity in 60 days on issue or PR
#94
opened Apr 20, 2021 by
Benan-Akca
Squad format Dataset for Supervised Finetuning and Evaluation of ORQA task.
stale
No activity in 60 days on issue or PR
#124
opened Jul 21, 2021 by
DevavratSinghBisht
Preocessing data about T5
stale
No activity in 60 days on issue or PR
#110
opened Jun 11, 2021 by
Hanlard
The training of T5 using FP16 is unstable
stale
No activity in 60 days on issue or PR
#115
opened Jul 6, 2021 by
zhuhong
In BERT pretraining how to specify DATA_PATH to take multiple files
stale
No activity in 60 days on issue or PR
#117
opened Jul 7, 2021 by
armundle
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.