[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening full file is slow and useless #27

Open
maxired opened this issue May 2, 2014 · 5 comments
Open

Opening full file is slow and useless #27

maxired opened this issue May 2, 2014 · 5 comments

Comments

@maxired
Copy link
maxired commented May 2, 2014

Hi,

when we give a filename to the library, it use fs.readfile to open the file.
Looks like the whole file is open, which is slow for big pictures.

I suggest that we first open the header of the picture and then read only the bytes corresponding to the exif part.

I would love to submit a pull request for this, but since there is not unit test (issue #26), I am not sure not to break anything.

I am thinking about something like :

var firstReadLength=6;
fs.open(image, 'r', function(err, fd) {
  var data = new Buffer(firstReadLength);
  fs.read(fd, data, 0, firstReadLength, null, function(err, length, buffer) {
    if (buffer[0] == 0xFF && buffer[1] == 0xd8) {
      if (buffer[2] == 0xff && buffer[3] == 0xe1) {
        var length = buffer.readUInt16BE(4);
        var exifBuffer = new Buffer(length);
        fs.read(fd, exifBuffer, 0, length , 0, function(err, length, buffer) {
          processImage(buffer, callback);
        });
      }
    }
}

In my case, this lead to significant performance improvement : >*6 with pictures of around 7MB each (dont forget to drop your cache when doing this kind of performance analysis )

 echo 3 | sudo tee /proc/sys/vm/drop_caches
@maxired maxired changed the title Performance improvement Opening full file is slow and useless May 2, 2014
@cirocosta
Copy link

That really matters for big images. 👍

@machunter
Copy link

Yes I think this is important when you are processing more than one image.

@lacombar
Copy link

@maxired : you have to be careful about testing performance with a cold cache, because your numbers will be polluted by the time needed for node to bootstrap itself:

$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time node /dev/null
$ time node /dev/null
node /dev/null  0.16s user 0.04s system 12% cpu 1.538 total
node /dev/null  0.16s user 0.01s system 99% cpu 0.176 total

@lacombar
Copy link

@maxired btw, I implemented a different solution, instead of only reading the exact amount of needed data, I load data by chunk up until the JPEG APP1 Exif segment is fully available. This has the advantage to minimize the amount of round-trip to the kernel, whereas your solution implies a lot of them. It turns out that the time to read large JPEG is pretty minimal. As a reference, I am using a 24MB post-photoshop JPEG which came out of the new 50Mpix Canon 5DS, here is some stats (all number are in ms measured from before the creation of the ExifImage object to the execution of the callback).

x time.base
+ time.1k
* time.64k
% time.128k
# time.1m
@ time.128m
+--------------------------------------------------------------------------+
|     *+OOOO#OO+*     #+       xxOx@    @     x * @                       x|
||_|____|MAA|A_|_________|__|___|M____A_A______|__|                        |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  11           412           827           437     478.36364     121.87229
+  11           188           598           226     264.72727     117.91954
Difference at 95.0% confidence
    -213.636 +/- 106.659
    -44.6598% +/- 22.2966%
    (Student's t, pooled s = 119.912)
*  11           171           573           202     239.36364     113.74117
Difference at 95.0% confidence
    -239 +/- 104.848
    -49.962% +/- 21.9181%
    (Student's t, pooled s = 117.877)
%  11           189           241           212     215.54545      19.48006
Difference at 95.0% confidence
    -262.818 +/- 77.6249
    -54.9411% +/- 16.2272%
    (Student's t, pooled s = 87.2706)
#  11           196           328           219     226.27273     38.113228
Difference at 95.0% confidence
    -252.091 +/- 80.3128
    -52.6986% +/- 16.7891%
    (Student's t, pooled s = 90.2925)
@   4           436           600           501         496.5     74.496085
No difference proven at 95.0% confidence

The suffix indicate the chunk size.

The following command was used to generate the numbers:

for f in $(seq 0 10); do
        echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null;
        ~/exif-image ~/canon-5ds-sample-image-full.jpg;
done

The test machine itself is running an old i3-2377M.

@titarenko
Copy link

@maxired exactly, such approach causes huge memory consumption ending with OOM kill, while trying to do batch processing. if you are still interested in lighter approach, take a look at this module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants