Spend less wasted time zeroing out ImageBuf #3754
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Examining some timings and function call counts, I saw that a typical oiiotool command was spending a surprising amount of time in ImageBufAlgo::zero(), not just more runtime overall, but also calling it many more times than I thought it should. For example,
called ImageBufAlgo::zero()... 4 times, totaling as much time as the colorconvert operation.
The idiom used in
IBA::op(dst, src, ...)
which initializes thedst
IB if it's not already initialized will call the IB constructor that zeros out its contents (so you don't have every new IB full of garbage pixel values). But for IBA functions, this is almost never necessary, since the IBA function itself will tend to overwrite all pixel values. Not always, of course -- since IBA functions let you restrict the spatial area or channel range -- but in the cases where the destination starts out uninitialized and is allocated to exactly mtch the ROI, I think you can count on the calling IBA function filling in all those values.So I changed IBAPrep() to call the IB constructor that says not to zero out the pixels.
Just in case we ever need the ability, I made a new flag value that lets you call IBAPrep and tell it, yes, do zero out any newly allocated buffers. But we pass the testsuite without needing that flag anywhere, so I think we're ok.
Also, the --create command itself would make new buffers (with the implied zeroing), and then explicitly zero it again!
So after these changes, the oiiotool example command line above only calls IBA::zero() ONCE (correctly, in the initial new image --create makes), eliminating three redundant calls.
I also hypothesized that each call to zero was taking too much time, because underneath it was calling IBA::fill which laboriously walked through the IB with iterators and copying a fixed pixel value that happened to contain 0 values. So I rewrote a special case where if the buffer had contiguous strides, it would just use memset instead. Surprisingly, it was not appreciably faster, and I'm guessing that the memory bandwidth necessary to touch every pixel in a 4k buffer is the limiting factor, and the other pixel-by-pixel iterator diddling was inconsequential compared to waiting for the memory bus. I'm still going to leave this "optimization" in, even though it was barely measurable for the case I was trying.