NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 9.3k

Code
Issues 305
Pull requests 129
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

305 Open 275 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[QUESTION]when pretraining bert，meet bug：cuBLAS Error: the requested functionality is not supported

#876 opened Jun 18, 2024 by shanyuaa updated Jul 3, 2024

[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16"

#883 opened Jun 21, 2024 by dong-liuliu updated Jul 3, 2024

[QUESTION] Why is TELayerNormColumnParallelLinear used instead of TEColumnParallelLinear in gpt_layer_specs

#884 opened Jun 21, 2024 by clarence-lee-sheng updated Jul 3, 2024

[QUESTION] When will model have _extra_state?

#900 opened Jul 3, 2024 by 1049451037 updated Jul 3, 2024

[ENHANCEMENT] Enhance data efficiency with efficient sequence packing stale

No activity in 60 days on issue or PR

#478 opened Aug 24, 2023 by Barber0 updated Jul 3, 2024

[QUESTION] Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining?

#899 opened Jul 2, 2024 by Leo-T-Zang updated Jul 2, 2024

[QUESTION] How to pre-build the dataset's index ? stale

No activity in 60 days on issue or PR

#795 opened Apr 24, 2024 by etiennemlb updated Jul 2, 2024

[BUG] WHEN install with nemo

#898 opened Jul 2, 2024 by willy808 updated Jul 2, 2024

[BUG] Bug of expert model parallel stale

No activity in 60 days on issue or PR

#766 opened Apr 7, 2024 by 1049451037 updated Jun 30, 2024

[BUG]When loading from checkpoint to continue training, it will hang during the validation‘s forward.

#647 opened Dec 28, 2023 by young-chao updated Jun 29, 2024

[QUESTION] bf16 Parameters and fp32 Gradients stale

No activity in 60 days on issue or PR

#800 opened Apr 30, 2024 by pluiez updated Jun 29, 2024

Batch_input and elapsed time per iteration slow down during model training

#897 opened Jun 29, 2024 by Yuhanleeee updated Jun 29, 2024

[BUGS] Pipeline Parallelism fails/hangs with Megatron Core example

#881 opened Jun 20, 2024 by schheda1 updated Jun 28, 2024

[REGRESSION] MoEs are obtaining higher loss than they should during training

#894 opened Jun 27, 2024 by kiddyboots216 updated Jun 28, 2024

[BUG] @jit_fuser fails with Unknown type constructor Sequence

#880 opened Jun 20, 2024 by Edenzzzz updated Jun 28, 2024

[BUG]Question about helpers.cpp in version core_v0.7.0

#896 opened Jun 28, 2024 by longzhang418 updated Jun 28, 2024

[QUESTION] Does Megatron-LM supports P100?

#849 opened May 29, 2024 by gaokaiz2 updated Jun 28, 2024

[BUG] AttributeError: module 'transformer_engine' has no attribute 'pytorch' stale

No activity in 60 days on issue or PR

#696 opened Feb 19, 2024 by zhentingqi updated Jun 27, 2024

[QUESTION] Getting tools/preprocess_data.py to work is painful

#892 opened Jun 26, 2024 by sambar1729 updated Jun 26, 2024

[QUESTION] Sample idx, bin files in public domain for trying out pretrain_gpt.py?

#891 opened Jun 26, 2024 by sambar1729 updated Jun 26, 2024

[QUESTION] Has standalone_embedding_stage been supported yet in core?

#890 opened Jun 26, 2024 by JiwenJ updated Jun 26, 2024

[BUG] NCCL TIMEOUT ( maybe ALLREDUCE ? )

#735 opened Mar 14, 2024 by ZhangEnmao updated Jun 25, 2024

[QUESTION]Zarr-based strategies will not be registered because of missing packages

#689 opened Feb 5, 2024 by ZhangEnmao updated Jun 24, 2024

How about supporting alternatives to fine-tuning? stale

No activity in 60 days on issue or PR

#114 opened Jul 6, 2021 by hwijeen updated Jun 22, 2024

[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?

#785 opened Apr 19, 2024 by ezioliao updated Jun 20, 2024

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly