[go: nahoru, domu]

Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

modify typo in megatron/core/models/bert/bert_model.py
#888 opened Jun 24, 2024 by wplf Loading…
modify typo in bert_model.py
#887 opened Jun 24, 2024 by wplf Loading…
Rename the correct variable of seed
#886 opened Jun 23, 2024 by FancyXun Loading…
OPTIM get_batch traffic when enable context-parallel
#885 opened Jun 22, 2024 by Superkeyv Loading…
[BUG] Wrong lr multiplier
#882 opened Jun 21, 2024 by artyomtugaryov Loading…
[bug] fix xavier uniform init for output layers
#814 opened May 8, 2024 by hjlee1371 Loading…
Support for Megatron-VLM training
#806 opened May 5, 2024 by 1049451037 Loading…
Add dataset packing
#802 opened May 2, 2024 by shamanez Loading…
fix finalize_model_grads when sp is on stale No activity in 60 days on issue or PR
#798 opened Apr 29, 2024 by zhaoyinglia Loading…
Speed up the creation of attention mask
#797 opened Apr 29, 2024 by yuantailing Loading…
Fix incorrect src argument in broadcast_params function stale No activity in 60 days on issue or PR
#796 opened Apr 26, 2024 by Yuxin-CV Loading…
fix loading distributed checkpoint when enable auto-detect-ckpt-format but disable use-dist-ckpt stale No activity in 60 days on issue or PR
#794 opened Apr 24, 2024 by imh966 Loading…
modifed the model parreleized gpt pre-trainign script stale No activity in 60 days on issue or PR
#789 opened Apr 22, 2024 by shamanez Loading…
forward step missing arg stale No activity in 60 days on issue or PR
#784 opened Apr 18, 2024 by malay-nagda Loading…
fix a mistake when check if num_layers dividable by vpp stale No activity in 60 days on issue or PR
#781 opened Apr 16, 2024 by constroy Loading…
Update pretrain_bert.py stale No activity in 60 days on issue or PR
#772 opened Apr 9, 2024 by ocryptocode Loading…
[very simple change] Remove duplicated code stale No activity in 60 days on issue or PR
#765 opened Apr 3, 2024 by NoelBird Loading…
fix new bucket when param require new bucket stale No activity in 60 days on issue or PR
#762 opened Apr 2, 2024 by wangxicoding Loading…
Updated fused_kernels import path stale No activity in 60 days on issue or PR
#760 opened Mar 31, 2024 by Yazeed7 Loading…
use new methods for communication stale No activity in 60 days on issue or PR
#758 opened Mar 30, 2024 by mayank31398 Loading…
ProTip! Follow long discussions with comments:>50.