-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION] bf16 Parameters and fp32 Gradients
stale
No activity in 60 days on issue or PR
#800
opened Apr 30, 2024 by
pluiez
[QUESTION] Validation loss & PPL keep going up
stale
No activity in 60 days on issue or PR
#787
opened Apr 20, 2024 by
zhentingqi
[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?
#785
opened Apr 19, 2024 by
ezioliao
[BUG] The bug about the options of the Megatron-core, transformer-impl and flash-attention.
stale
No activity in 60 days on issue or PR
#778
opened Apr 12, 2024 by
Baibaifan
[BUG] ConstantGradScaler and loss-scale argument not match
stale
No activity in 60 days on issue or PR
#776
opened Apr 12, 2024 by
BeingGod
[BUG] Passed the wrong type of argument to torch.distributed.broadcast.
stale
No activity in 60 days on issue or PR
#774
opened Apr 11, 2024 by
sandyhouse
[QUESTION] vicuna-7b-v1.5 weight conversion from huggingface to megatron-lm format
#773
opened Apr 10, 2024 by
uehara-mech
[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
stale
No activity in 60 days on issue or PR
#770
opened Apr 9, 2024 by
REIGN12
[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
stale
No activity in 60 days on issue or PR
#769
opened Apr 9, 2024 by
starkhu
[BUG] How to checkpoint the specific microbatch in pipeline parallelism?
stale
No activity in 60 days on issue or PR
#767
opened Apr 7, 2024 by
robotsp
[BUG] Bug of expert model parallel
stale
No activity in 60 days on issue or PR
#766
opened Apr 7, 2024 by
1049451037
[BUG] ModuleNotFoundError: No module named 'megatron.training.tokenizer'; 'megatron.training' is not a package
stale
No activity in 60 days on issue or PR
#763
opened Apr 2, 2024 by
hellangleZ
MOE training Loss inconsistent after resume from old checkpoint
stale
No activity in 60 days on issue or PR
#761
opened Apr 1, 2024 by
guozhen1997
[QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS
#756
opened Mar 30, 2024 by
ShinoharaHare
Loss mask uses torch.float32 instead of bool
stale
No activity in 60 days on issue or PR
#754
opened Mar 29, 2024 by
pilot7747
[BUG] ModuleNotFoundError: No module named 'scaled_softmax_cuda'
stale
No activity in 60 days on issue or PR
#749
opened Mar 23, 2024 by
liuliuliu0605
[QUESTION]Why does Megatron-LM using gloo backend when Creating Parrallel Group ?
stale
No activity in 60 days on issue or PR
#746
opened Mar 21, 2024 by
wuyingjun-lucky
[QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32?
stale
No activity in 60 days on issue or PR
#744
opened Mar 21, 2024 by
rchardx
[QUESTION] Why take too much time to sync up barrier information between ranks
stale
No activity in 60 days on issue or PR
#742
opened Mar 20, 2024 by
yanminjia
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.