NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 9.2k

Code
Issues 302
Pull requests 130
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

302 Open 275 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How to convert megatron T5 model to huggingface T5 model? stale

No activity in 60 days on issue or PR

#178 opened Jan 20, 2022 by ZikaiGuo

👀

How about supporting alternatives to fine-tuning? stale

No activity in 60 days on issue or PR

#114 opened Jul 6, 2021 by hwijeen

👀

[QUESTION] which script to use to launch llama 2 fine tuning? stale

No activity in 60 days on issue or PR

#528 opened Oct 4, 2023 by ppt0011

👀

Huggingface <-> Megatron-LM Compatibility

#37 opened Jul 6, 2020 by usuyama

👀

[QUESTION] Seeking documentation/paper for expert and context parallel stale

No activity in 60 days on issue or PR

#562 opened Oct 26, 2023 by mayank31398

👀

[BUG] SwitchMLP is not compatible with distributed_optimizer, but there is no assertion. stale

No activity in 60 days on issue or PR

#565 opened Oct 30, 2023 by huangyf530

👀

[BUG] @jit_fuser fails with Unknown type constructor Sequence

#880 opened Jun 20, 2024 by Edenzzzz

👀

[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage?

#861 opened Jun 8, 2024 by janelu9

👀

[REGRESSION] MoEs are obtaining higher loss than they should during training

#894 opened Jun 27, 2024 by kiddyboots216

👀

[QUESTION] Hello, when finetune llama2-7B, it's error: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle) stale

No activity in 60 days on issue or PR

#549 opened Oct 17, 2023 by 13416157913

👀

--no-query-key-layer-scaling makes no difference stale

No activity in 60 days on issue or PR

#279 opened Feb 8, 2023 by felix-schneider

👀

[BUG] Example of pretraining BERT does not work

#791 opened Apr 23, 2024 by xju2

👀

Batch_input and elapsed time per iteration slow down during model training

#897 opened Jun 29, 2024 by Yuhanleeee

How to calculate FLOPS? stale

No activity in 60 days on issue or PR

#76 opened Mar 9, 2021 by ShivanshuPurohit

why rank 0 comsumes more gpu memory than other ranks within single machine stale

No activity in 60 days on issue or PR

#78 opened Mar 10, 2021 by huangjundashuaige

Vocabulary size of released pretrained BERT does not match the provided vocabulary file stale

No activity in 60 days on issue or PR

#85 opened Mar 16, 2021 by rhythmswing

loss curve in pretraining BERT is very strange stale

No activity in 60 days on issue or PR

#86 opened Mar 16, 2021 by zjujh1995

bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1 stale

No activity in 60 days on issue or PR

#88 opened Mar 23, 2021 by bugface

AttributeError: module 'torch' has no attribute '_amp_foreach_non_finite_check_and_unscale_' stale

No activity in 60 days on issue or PR

#90 opened Apr 8, 2021 by Carolingliang

Can not create embeddings from Megatron stale

No activity in 60 days on issue or PR

#91 opened Apr 17, 2021 by Benan-Akca

Do we need to use preprocess.py before loading dataset? stale

No activity in 60 days on issue or PR

#94 opened Apr 20, 2021 by Benan-Akca

Squad format Dataset for Supervised Finetuning and Evaluation of ORQA task. stale

No activity in 60 days on issue or PR

#124 opened Jul 21, 2021 by DevavratSinghBisht

Preocessing data about T5 stale

No activity in 60 days on issue or PR

#110 opened Jun 11, 2021 by Hanlard

The training of T5 using FP16 is unstable stale

No activity in 60 days on issue or PR

#115 opened Jul 6, 2021 by zhuhong

In BERT pretraining how to specify DATA_PATH to take multiple files stale

No activity in 60 days on issue or PR

#117 opened Jul 7, 2021 by armundle

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly