[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spend less wasted time zeroing out ImageBuf #3754

Merged
merged 1 commit into from
Jan 18, 2023

Conversation

lgritz
Copy link
Collaborator
@lgritz lgritz commented Jan 18, 2023

Examining some timings and function call counts, I saw that a typical oiiotool command was spending a surprising amount of time in ImageBufAlgo::zero(), not just more runtime overall, but also calling it many more times than I thought it should. For example,

oiiotool -create 4096x4096 3 -colorconvert srgb acescg

called ImageBufAlgo::zero()... 4 times, totaling as much time as the colorconvert operation.

The idiom used in IBA::op(dst, src, ...) which initializes the dst IB if it's not already initialized will call the IB constructor that zeros out its contents (so you don't have every new IB full of garbage pixel values). But for IBA functions, this is almost never necessary, since the IBA function itself will tend to overwrite all pixel values. Not always, of course -- since IBA functions let you restrict the spatial area or channel range -- but in the cases where the destination starts out uninitialized and is allocated to exactly mtch the ROI, I think you can count on the calling IBA function filling in all those values.

So I changed IBAPrep() to call the IB constructor that says not to zero out the pixels.

Just in case we ever need the ability, I made a new flag value that lets you call IBAPrep and tell it, yes, do zero out any newly allocated buffers. But we pass the testsuite without needing that flag anywhere, so I think we're ok.

Also, the --create command itself would make new buffers (with the implied zeroing), and then explicitly zero it again!

So after these changes, the oiiotool example command line above only calls IBA::zero() ONCE (correctly, in the initial new image --create makes), eliminating three redundant calls.

I also hypothesized that each call to zero was taking too much time, because underneath it was calling IBA::fill which laboriously walked through the IB with iterators and copying a fixed pixel value that happened to contain 0 values. So I rewrote a special case where if the buffer had contiguous strides, it would just use memset instead. Surprisingly, it was not appreciably faster, and I'm guessing that the memory bandwidth necessary to touch every pixel in a 4k buffer is the limiting factor, and the other pixel-by-pixel iterator diddling was inconsequential compared to waiting for the memory bus. I'm still going to leave this "optimization" in, even though it was barely measurable for the case I was trying.

Examining some timings and function call counts, I saw that a typical
oiiotool command was spending a surprising amount of time in
ImageBufAlgo::zero(), not just more runtime overall, but also calling
it many more times than I thought it should.  For example,

    oiiotool -create 4096x4096 3 -colorconvert srgb acescg

called ImageBufAlgo::zero()... 4 times, totaling as much time as the
colorconvert operation.

The idiom used in `IBA::op(dst, src, ...)` which initializes the `dst`
IB if it's not already initialized will call the IB constructor that
zeros out its contents (so you don't have every new IB full of garbage
pixel values). But for IBA functions, this is almost never necessary,
since the IBA function itself will tend to overwrite all pixel values.
Not always, of course -- since IBA functions let you restrict the
spatial area or channel range -- but in the cases where the
destination starts out uninitialized and is allocated to exactly mtch
the ROI, I *think* you can count on the calling IBA function filling
in all those values.

So I changed IBAPrep() to call the IB constructor that says not to
zero out the pixels.

Just in case we ever need the ability, I made a new flag value that
lets you call IBAPrep and tell it, yes, do zero out any newly
allocated buffers. But we pass the testsuite without needing that flag
anywhere, so I think we're ok.

Also, the --create command itself would make new buffers (with the
implied zeroing), and then explicitly zero it again!

So after these changes, the oiiotool example command line above only
calls IBA::zero() ONCE (correctly, in the initial new image --create
makes), eliminating three redundant calls.

I also hypothesized that each call to zero was taking too much time,
because underneath it was calling IBA::fill which laboriously walked
through the IB with iterators and copying a fixed pixel value that
happened to contain 0 values. So I rewrote a special case where if the
buffer had contiguous strides, it would just use memset instead.
Surprisingly, it was not appreciably faster, and I'm guessing that the
memory bandwidth necessary to touch every pixel in a 4k buffer is the
limiting factor, and the other pixel-by-pixel iterator diddling was
inconsequential compared to waiting for the memory bus. I'm still
going to leave this "optimization" in, even though it was barely
measurable for the case I was trying.
@lgritz lgritz merged commit 59f54c8 into AcademySoftwareFoundation:master Jan 18, 2023
@lgritz lgritz deleted the lg-zero branch January 18, 2023 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant