-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION]when pretraining bert,meet bug:cuBLAS Error: the requested functionality is not supported
#876
opened Jun 18, 2024 by
shanyuaa
updated Jul 3, 2024
[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16"
#883
opened Jun 21, 2024 by
dong-liuliu
updated Jul 3, 2024
[QUESTION] Why is TELayerNormColumnParallelLinear used instead of TEColumnParallelLinear in gpt_layer_specs
#884
opened Jun 21, 2024 by
clarence-lee-sheng
updated Jul 3, 2024
[QUESTION] When will model have
_extra_state
?
#900
opened Jul 3, 2024 by
1049451037
updated Jul 3, 2024
[ENHANCEMENT] Enhance data efficiency with efficient sequence packing
stale
No activity in 60 days on issue or PR
#478
opened Aug 24, 2023 by
Barber0
updated Jul 3, 2024
[QUESTION] Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining?
#899
opened Jul 2, 2024 by
Leo-T-Zang
updated Jul 2, 2024
[QUESTION] How to pre-build the dataset's index ?
stale
No activity in 60 days on issue or PR
#795
opened Apr 24, 2024 by
etiennemlb
updated Jul 2, 2024
[BUG] Bug of expert model parallel
stale
No activity in 60 days on issue or PR
#766
opened Apr 7, 2024 by
1049451037
updated Jun 30, 2024
[BUG]When loading from checkpoint to continue training, it will hang during the validation‘s forward.
#647
opened Dec 28, 2023 by
young-chao
updated Jun 29, 2024
[QUESTION] bf16 Parameters and fp32 Gradients
stale
No activity in 60 days on issue or PR
#800
opened Apr 30, 2024 by
pluiez
updated Jun 29, 2024
Batch_input and elapsed time per iteration slow down during model training
#897
opened Jun 29, 2024 by
Yuhanleeee
updated Jun 29, 2024
[BUGS] Pipeline Parallelism fails/hangs with Megatron Core example
#881
opened Jun 20, 2024 by
schheda1
updated Jun 28, 2024
[REGRESSION] MoEs are obtaining higher loss than they should during training
#894
opened Jun 27, 2024 by
kiddyboots216
updated Jun 28, 2024
[BUG] @jit_fuser fails with Unknown type constructor Sequence
#880
opened Jun 20, 2024 by
Edenzzzz
updated Jun 28, 2024
[BUG]Question about helpers.cpp in version core_v0.7.0
#896
opened Jun 28, 2024 by
longzhang418
updated Jun 28, 2024
[QUESTION] Does Megatron-LM supports P100?
#849
opened May 29, 2024 by
gaokaiz2
updated Jun 28, 2024
[BUG] AttributeError: module 'transformer_engine' has no attribute 'pytorch'
stale
No activity in 60 days on issue or PR
#696
opened Feb 19, 2024 by
zhentingqi
updated Jun 27, 2024
[QUESTION] Getting tools/preprocess_data.py to work is painful
#892
opened Jun 26, 2024 by
sambar1729
updated Jun 26, 2024
[QUESTION] Sample idx, bin files in public domain for trying out pretrain_gpt.py?
#891
opened Jun 26, 2024 by
sambar1729
updated Jun 26, 2024
[QUESTION] Has standalone_embedding_stage been supported yet in core?
#890
opened Jun 26, 2024 by
JiwenJ
updated Jun 26, 2024
[BUG] NCCL TIMEOUT ( maybe ALLREDUCE ? )
#735
opened Mar 14, 2024 by
ZhangEnmao
updated Jun 25, 2024
[QUESTION]Zarr-based strategies will not be registered because of missing packages
#689
opened Feb 5, 2024 by
ZhangEnmao
updated Jun 24, 2024
How about supporting alternatives to fine-tuning?
stale
No activity in 60 days on issue or PR
#114
opened Jul 6, 2021 by
hwijeen
updated Jun 22, 2024
[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?
#785
opened Apr 19, 2024 by
ezioliao
updated Jun 20, 2024
Previous Next
ProTip!
no:milestone will show everything without a milestone.