NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 9.2k

Code
Issues 302
Pull requests 130
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

302 Open 275 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[QUESTION] Does Megatron-Core supports LLAMA models?

#803 opened May 3, 2024 by noob-ctrl

[QUESTION] bf16 Parameters and fp32 Gradients stale

No activity in 60 days on issue or PR

#800 opened Apr 30, 2024 by pluiez

[QUESTION] How to pre-build the dataset's index ?

#795 opened Apr 24, 2024 by etiennemlb

[BUG] Example of pretraining BERT does not work

#791 opened Apr 23, 2024 by xju2

When can we have a the MOE checkpoint convert script.

#790 opened Apr 22, 2024 by shamanez

[QUESTION] Validation loss & PPL keep going up stale

No activity in 60 days on issue or PR

#787 opened Apr 20, 2024 by zhentingqi

[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?

#785 opened Apr 19, 2024 by ezioliao

[QUESTION] RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=2, worker_count=1, timeout=0:10:00)

#782 opened Apr 16, 2024 by JanryPei

[BUG] The bug about the options of the Megatron-core, transformer-impl and flash-attention. stale

No activity in 60 days on issue or PR

#778 opened Apr 12, 2024 by Baibaifan

[BUG] ConstantGradScaler and loss-scale argument not match stale

No activity in 60 days on issue or PR

#776 opened Apr 12, 2024 by BeingGod

[BUG] Passed the wrong type of argument to torch.distributed.broadcast. stale

No activity in 60 days on issue or PR

#774 opened Apr 11, 2024 by sandyhouse

[QUESTION] vicuna-7b-v1.5 weight conversion from huggingface to megatron-lm format

#773 opened Apr 10, 2024 by uehara-mech

[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain? stale

No activity in 60 days on issue or PR

#770 opened Apr 9, 2024 by REIGN12

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class? stale

No activity in 60 days on issue or PR

#769 opened Apr 9, 2024 by starkhu

[BUG] How to checkpoint the specific microbatch in pipeline parallelism? stale

No activity in 60 days on issue or PR

#767 opened Apr 7, 2024 by robotsp

[BUG] Bug of expert model parallel stale

No activity in 60 days on issue or PR

#766 opened Apr 7, 2024 by 1049451037

[BUG] ModuleNotFoundError: No module named 'megatron.training.tokenizer'; 'megatron.training' is not a package stale

No activity in 60 days on issue or PR

#763 opened Apr 2, 2024 by hellangleZ

MOE training Loss inconsistent after resume from old checkpoint stale

No activity in 60 days on issue or PR

#761 opened Apr 1, 2024 by guozhen1997

[QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS

#756 opened Mar 30, 2024 by ShinoharaHare

Loss mask uses torch.float32 instead of bool stale

No activity in 60 days on issue or PR

#754 opened Mar 29, 2024 by pilot7747

[BUG] ModuleNotFoundError: No module named 'scaled_softmax_cuda' stale

No activity in 60 days on issue or PR

#749 opened Mar 23, 2024 by liuliuliu0605

[QUESTION]Why does Megatron-LM using gloo backend when Creating Parrallel Group ? stale

No activity in 60 days on issue or PR

#746 opened Mar 21, 2024 by wuyingjun-lucky

[QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32? stale

No activity in 60 days on issue or PR

#744 opened Mar 21, 2024 by rchardx

[BUG]

#743 opened Mar 20, 2024 by lakshya-4gp

[QUESTION] Why take too much time to sync up barrier information between ranks stale

No activity in 60 days on issue or PR

#742 opened Mar 20, 2024 by yanminjia

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly