[go: nahoru, domu]

Skip to content

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[QUESTION] Seeking documentation/paper for expert and context parallel stale No activity in 60 days on issue or PR
#562 opened Oct 26, 2023 by mayank31398
how to export the gpt2 to onnx model? stale No activity in 60 days on issue or PR
#99 opened May 7, 2021 by HuuY
the issue of GPU utilize stale No activity in 60 days on issue or PR
#73 opened Mar 2, 2021 by ywd-pku
GLUE tasks for BERT stale No activity in 60 days on issue or PR
#74 opened Mar 8, 2021 by casually-PYlearner
How to calculate FLOPS? stale No activity in 60 days on issue or PR
#76 opened Mar 9, 2021 by ShivanshuPurohit
Fused kernel compilation could get stuck bug Something isn't working stale No activity in 60 days on issue or PR
#82 opened Mar 14, 2021 by rhythmswing
loss curve in pretraining BERT is very strange stale No activity in 60 days on issue or PR
#86 opened Mar 16, 2021 by zjujh1995
Support LAMB optimizer stale No activity in 60 days on issue or PR
#87 opened Mar 23, 2021 by bugface
bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1 stale No activity in 60 days on issue or PR
#88 opened Mar 23, 2021 by bugface
Can not create embeddings from Megatron stale No activity in 60 days on issue or PR
#91 opened Apr 17, 2021 by Benan-Akca
Add new attention features on megatron to optimize the performance. stale No activity in 60 days on issue or PR
#106 opened May 22, 2021 by rainmaker712
Distributed training all-reduce order stale No activity in 60 days on issue or PR
#107 opened May 31, 2021 by zhiqi-0
Unclear description of ICT pretraining stale No activity in 60 days on issue or PR
#108 opened Jun 7, 2021 by hangzhang-nlp
Preocessing data about T5 stale No activity in 60 days on issue or PR
#110 opened Jun 11, 2021 by Hanlard
Problems about model parallel stale No activity in 60 days on issue or PR
#111 opened Jun 11, 2021 by Shaw95
AttributeError: 'Parameter' object has no attribute 'main_grad' stale No activity in 60 days on issue or PR
#112 opened Jun 27, 2021 by xyltt
The training of T5 using FP16 is unstable stale No activity in 60 days on issue or PR
#115 opened Jul 6, 2021 by zhuhong
In BERT pretraining how to specify DATA_PATH to take multiple files stale No activity in 60 days on issue or PR
#117 opened Jul 7, 2021 by armundle
Do we need to use preprocess.py before loading dataset? stale No activity in 60 days on issue or PR
#94 opened Apr 20, 2021 by Benan-Akca
ProTip! Updated in the last three days: updated:>2024-06-28.