You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can reach good bandwidth efficiency when combining linear indexing with dropped field dimensions on ClimaCore broadcasted objects for pointwise kernels (this is the thermo_bench_bw.jl benchmark script):
One future proofing complication of this branch is that we will need to continue to support the field dimension being present (perhaps inside TupleOfArrays, or whatever we decide to call this new layer's struct) in order to still work reasonably with on the order of 100 tracers.
Just to note: dropping the field dimension roughly 2xed the performance, and using linear indexing accounted for the rest. As discussed with @tapios, only applying linear indexing seems to improve performance for broadcasting with single variables, but seems to degrade performance with multiple variables. So, it seems that both of these changes are needed in tandem to improve the performance.
From this very hacked branch:
https://github.com/CliMA/ClimaCore.jl/tree/ck/drop_field_dimension (PR #1929).
We can reach good bandwidth efficiency when combining linear indexing with dropped field dimensions on ClimaCore broadcasted objects for pointwise kernels (this is the
thermo_bench_bw.jl
benchmark script):Main branch (Clima A100):
Branch with dropped field dimension + linear indexing (Clima A100):
One future proofing complication of this branch is that we will need to continue to support the field dimension being present (perhaps inside
TupleOfArrays
, or whatever we decide to call this new layer's struct) in order to still work reasonably with on the order of 100 tracers.Just to note: dropping the field dimension roughly
2x
ed the performance, and using linear indexing accounted for the rest. As discussed with @tapios, only applying linear indexing seems to improve performance for broadcasting with single variables, but seems to degrade performance with multiple variables. So, it seems that both of these changes are needed in tandem to improve the performance.cc @tapios
Tasks
The text was updated successfully, but these errors were encountered: