-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION] Seeking documentation/paper for expert and context parallel
stale
No activity in 60 days on issue or PR
#562
opened Oct 26, 2023 by
mayank31398
how to export the gpt2 to onnx model?
stale
No activity in 60 days on issue or PR
#99
opened May 7, 2021 by
HuuY
the issue of GPU utilize
stale
No activity in 60 days on issue or PR
#73
opened Mar 2, 2021 by
ywd-pku
GLUE tasks for BERT
stale
No activity in 60 days on issue or PR
#74
opened Mar 8, 2021 by
casually-PYlearner
How to calculate FLOPS?
stale
No activity in 60 days on issue or PR
#76
opened Mar 9, 2021 by
ShivanshuPurohit
why rank 0 comsumes more gpu memory than other ranks within single machine
stale
No activity in 60 days on issue or PR
#78
opened Mar 10, 2021 by
huangjundashuaige
Fused kernel compilation could get stuck
bug
Something isn't working
stale
No activity in 60 days on issue or PR
#82
opened Mar 14, 2021 by
rhythmswing
Vocabulary size of released pretrained BERT does not match the provided vocabulary file
stale
No activity in 60 days on issue or PR
#85
opened Mar 16, 2021 by
rhythmswing
loss curve in pretraining BERT is very strange
stale
No activity in 60 days on issue or PR
#86
opened Mar 16, 2021 by
zjujh1995
Support LAMB optimizer
stale
No activity in 60 days on issue or PR
#87
opened Mar 23, 2021 by
bugface
bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1
stale
No activity in 60 days on issue or PR
#88
opened Mar 23, 2021 by
bugface
AttributeError: module 'torch' has no attribute '_amp_foreach_non_finite_check_and_unscale_'
stale
No activity in 60 days on issue or PR
#90
opened Apr 8, 2021 by
Carolingliang
Can not create embeddings from Megatron
stale
No activity in 60 days on issue or PR
#91
opened Apr 17, 2021 by
Benan-Akca
Add new attention features on megatron to optimize the performance.
stale
No activity in 60 days on issue or PR
#106
opened May 22, 2021 by
rainmaker712
Distributed training all-reduce order
stale
No activity in 60 days on issue or PR
#107
opened May 31, 2021 by
zhiqi-0
Unclear description of ICT pretraining
stale
No activity in 60 days on issue or PR
#108
opened Jun 7, 2021 by
hangzhang-nlp
Preocessing data about T5
stale
No activity in 60 days on issue or PR
#110
opened Jun 11, 2021 by
Hanlard
Problems about model parallel
stale
No activity in 60 days on issue or PR
#111
opened Jun 11, 2021 by
Shaw95
AttributeError: 'Parameter' object has no attribute 'main_grad'
stale
No activity in 60 days on issue or PR
#112
opened Jun 27, 2021 by
xyltt
Batch_input and elapsed time per iteration slow down during model training
#897
opened Jun 29, 2024 by
Yuhanleeee
The training of T5 using FP16 is unstable
stale
No activity in 60 days on issue or PR
#115
opened Jul 6, 2021 by
zhuhong
In BERT pretraining how to specify DATA_PATH to take multiple files
stale
No activity in 60 days on issue or PR
#117
opened Jul 7, 2021 by
armundle
Squad format Dataset for Supervised Finetuning and Evaluation of ORQA task.
stale
No activity in 60 days on issue or PR
#124
opened Jul 21, 2021 by
DevavratSinghBisht
Do we need to use preprocess.py before loading dataset?
stale
No activity in 60 days on issue or PR
#94
opened Apr 20, 2021 by
Benan-Akca
Previous Next
ProTip!
Updated in the last three days: updated:>2024-06-28.