[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward fill! to parent #1888

Closed
charleskawczynski opened this issue Jul 18, 2024 · 1 comment
Closed

Forward fill! to parent #1888

charleskawczynski opened this issue Jul 18, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@charleskawczynski
Copy link
Member

Our fill! method follows our copyto! pattern, which uses CartesianIndexing, which is suboptimal especially in the context of fill!, so, we should just forward all of our fill! methods to fill!(parent(data), val). Here's a local benchmark on an A100 at our target resolution:

julia> @benchmark CUDA.@sync fill!(X.x1, 1)
BenchmarkTools.Trial: 2911 samples with 1 evaluation.
 Range (min  max):  52.190 μs  2.701 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):      2.440 ms             ┊ GC (median):    0.00%
 Time  (mean ± σ):    1.715 ms ± 1.091 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █     ▁▃ ▁                                           ▇█▇▅▃ ▁
  █▁▁▁▁▁██▇█▆█▅▇█▆▅▄▄▅▇▃▃▃▅▆▃▁▁▃▁▃▃▅▄▁▇▁▃▁▁▁▁▁▁▁▃▃▇▄▅▅▆█████ █
  52.2 μs      Histogram: log(frequency) by time     2.62 ms <

 Memory estimate: 592 bytes, allocs estimate: 19.

julia> @benchmark CUDA.@sync fill!(parent(X.x1), 1)
BenchmarkTools.Trial: 4535 samples with 1 evaluation.
 Range (min  max):  20.560 μs  2.682 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     25.200 μs             ┊ GC (median):    0.00%
 Time  (mean ± σ):    1.101 ms ± 1.185 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █     ▁▂                                             ▅▅▆▄▂ ▁
  █▁▁▁▄▁██▇█▆▆▇▇▇▇▅▁▅▃▇▁▃▄▄▅▁▃▄▅▃▁▁▁▁▄▇▁▃▃▁▁▁▃▁▃▁▃▇▆▅▄▄█████ █
  20.6 μs      Histogram: log(frequency) by time     2.59 ms <

 Memory estimate: 128 bytes, allocs estimate: 7.
@charleskawczynski charleskawczynski added the enhancement New feature or request label Jul 18, 2024
@charleskawczynski charleskawczynski self-assigned this Jul 18, 2024
@charleskawczynski
Copy link
Member Author

We really just need to fix #1910, since the implementation in Base/CUDA is likely using linear indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant