-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
AttributeError: module 'torch' has no attribute '_amp_foreach_non_finite_check_and_unscale_'
stale
No activity in 60 days on issue or PR
#90
opened Apr 8, 2021 by
Carolingliang
In BERT pretraining how to specify DATA_PATH to take multiple files
stale
No activity in 60 days on issue or PR
#117
opened Jul 7, 2021 by
armundle
The training of T5 using FP16 is unstable
stale
No activity in 60 days on issue or PR
#115
opened Jul 6, 2021 by
zhuhong
How about supporting alternatives to fine-tuning?
stale
No activity in 60 days on issue or PR
#114
opened Jul 6, 2021 by
hwijeen
AttributeError: 'Parameter' object has no attribute 'main_grad'
stale
No activity in 60 days on issue or PR
#112
opened Jun 27, 2021 by
xyltt
Problems about model parallel
stale
No activity in 60 days on issue or PR
#111
opened Jun 11, 2021 by
Shaw95
Preocessing data about T5
stale
No activity in 60 days on issue or PR
#110
opened Jun 11, 2021 by
Hanlard
Unclear description of ICT pretraining
stale
No activity in 60 days on issue or PR
#108
opened Jun 7, 2021 by
hangzhang-nlp
Distributed training all-reduce order
stale
No activity in 60 days on issue or PR
#107
opened May 31, 2021 by
zhiqi-0
Add new attention features on megatron to optimize the performance.
stale
No activity in 60 days on issue or PR
#106
opened May 22, 2021 by
rainmaker712
how to export the gpt2 to onnx model?
stale
No activity in 60 days on issue or PR
#99
opened May 7, 2021 by
HuuY
Can not create embeddings from Megatron
stale
No activity in 60 days on issue or PR
#91
opened Apr 17, 2021 by
Benan-Akca
Further pretraining using BERT-base weights
stale
No activity in 60 days on issue or PR
#182
opened Jan 25, 2022 by
genesith
bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1
stale
No activity in 60 days on issue or PR
#88
opened Mar 23, 2021 by
bugface
Support LAMB optimizer
stale
No activity in 60 days on issue or PR
#87
opened Mar 23, 2021 by
bugface
loss curve in pretraining BERT is very strange
stale
No activity in 60 days on issue or PR
#86
opened Mar 16, 2021 by
zjujh1995
Vocabulary size of released pretrained BERT does not match the provided vocabulary file
stale
No activity in 60 days on issue or PR
#85
opened Mar 16, 2021 by
rhythmswing
Fused kernel compilation could get stuck
bug
Something isn't working
stale
No activity in 60 days on issue or PR
#82
opened Mar 14, 2021 by
rhythmswing
why rank 0 comsumes more gpu memory than other ranks within single machine
stale
No activity in 60 days on issue or PR
#78
opened Mar 10, 2021 by
huangjundashuaige
How to calculate FLOPS?
stale
No activity in 60 days on issue or PR
#76
opened Mar 9, 2021 by
ShivanshuPurohit
GLUE tasks for BERT
stale
No activity in 60 days on issue or PR
#74
opened Mar 8, 2021 by
casually-PYlearner
the issue of GPU utilize
stale
No activity in 60 days on issue or PR
#73
opened Mar 2, 2021 by
ywd-pku
Do we need to use preprocess.py before loading dataset?
stale
No activity in 60 days on issue or PR
#94
opened Apr 20, 2021 by
Benan-Akca
ProTip!
Add no:assignee to see everything that’s not assigned.