[go: nahoru, domu]

Skip to content

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[QUESTION] bf16 Parameters and fp32 Gradients stale No activity in 60 days on issue or PR
#800 opened Apr 30, 2024 by pluiez
[QUESTION] Validation loss & PPL keep going up stale No activity in 60 days on issue or PR
#787 opened Apr 20, 2024 by zhentingqi
[BUG] ConstantGradScaler and loss-scale argument not match stale No activity in 60 days on issue or PR
#776 opened Apr 12, 2024 by BeingGod
[BUG] Passed the wrong type of argument to torch.distributed.broadcast. stale No activity in 60 days on issue or PR
#774 opened Apr 11, 2024 by sandyhouse
[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class? stale No activity in 60 days on issue or PR
#769 opened Apr 9, 2024 by starkhu
[BUG] How to checkpoint the specific microbatch in pipeline parallelism? stale No activity in 60 days on issue or PR
#767 opened Apr 7, 2024 by robotsp
[BUG] Bug of expert model parallel stale No activity in 60 days on issue or PR
#766 opened Apr 7, 2024 by 1049451037
MOE training Loss inconsistent after resume from old checkpoint stale No activity in 60 days on issue or PR
#761 opened Apr 1, 2024 by guozhen1997
Loss mask uses torch.float32 instead of bool stale No activity in 60 days on issue or PR
#754 opened Mar 29, 2024 by pilot7747
[BUG] ModuleNotFoundError: No module named 'scaled_softmax_cuda' stale No activity in 60 days on issue or PR
#749 opened Mar 23, 2024 by liuliuliu0605
[BUG]
#743 opened Mar 20, 2024 by lakshya-4gp
[QUESTION] Why take too much time to sync up barrier information between ranks stale No activity in 60 days on issue or PR
#742 opened Mar 20, 2024 by yanminjia
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.