[go: nahoru, domu]

Skip to content

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

How to convert megatron T5 model to huggingface T5 model? stale No activity in 60 days on issue or PR
#178 opened Jan 20, 2022 by ZikaiGuo
How about supporting alternatives to fine-tuning? stale No activity in 60 days on issue or PR
#114 opened Jul 6, 2021 by hwijeen
[QUESTION] which script to use to launch llama 2 fine tuning? stale No activity in 60 days on issue or PR
#528 opened Oct 4, 2023 by ppt0011
[QUESTION] Seeking documentation/paper for expert and context parallel stale No activity in 60 days on issue or PR
#562 opened Oct 26, 2023 by mayank31398
--no-query-key-layer-scaling makes no difference stale No activity in 60 days on issue or PR
#279 opened Feb 8, 2023 by felix-schneider
How to calculate FLOPS? stale No activity in 60 days on issue or PR
#76 opened Mar 9, 2021 by ShivanshuPurohit
loss curve in pretraining BERT is very strange stale No activity in 60 days on issue or PR
#86 opened Mar 16, 2021 by zjujh1995
bert_dataset.py - ValueError: Seed must be between 0 and 2**32 - 1 stale No activity in 60 days on issue or PR
#88 opened Mar 23, 2021 by bugface
Can not create embeddings from Megatron stale No activity in 60 days on issue or PR
#91 opened Apr 17, 2021 by Benan-Akca
Do we need to use preprocess.py before loading dataset? stale No activity in 60 days on issue or PR
#94 opened Apr 20, 2021 by Benan-Akca
Preocessing data about T5 stale No activity in 60 days on issue or PR
#110 opened Jun 11, 2021 by Hanlard
The training of T5 using FP16 is unstable stale No activity in 60 days on issue or PR
#115 opened Jul 6, 2021 by zhuhong
In BERT pretraining how to specify DATA_PATH to take multiple files stale No activity in 60 days on issue or PR
#117 opened Jul 7, 2021 by armundle
ProTip! Mix and match filters to narrow down what you’re looking for.