[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower memory spike when loading with ISQ on CUDA #433

Merged
merged 4 commits into from
Jun 14, 2024

Conversation

EricLBuehler
Copy link
Owner

Currently when loading with ISQ, there is a spike in GPU memory usage. This is because tensors are copied to the GPU asynchronously and quantized. By forcing a GPU <> CPU synchronization, we can ensure that there is no overlap of operations and that copies are completed, meaning that the spike should be reduced.

Copy link
Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 31         1217         1038           37          142
 TOML                   16          440          400            1           39
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1135            0          836          299
 |- BASH                 5          100           97            0            3
 |- Python               6          122          110            0           12
 |- Rust                 2           80           72            3            5
 (Total)                           1437          279          839          319
-------------------------------------------------------------------------------
 Rust                  115        34379        31132          584         2663
 |- Markdown            57          643           13          596           34
 (Total)                          35022        31145         1180         2697
===============================================================================
 Total                 191        37668        32985         1458         3225
===============================================================================
  

@EricLBuehler EricLBuehler changed the title Lower memory usage when loading with ISQ Lower memory spike when loading with ISQ on CUDA Jun 14, 2024
@EricLBuehler EricLBuehler merged commit 6648673 into master Jun 14, 2024
11 checks passed
@EricLBuehler EricLBuehler deleted the isq_lower_mem_usage branch June 14, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant