add lazyEntries option

One of the main goals of `yauzl` is to keep memory usage under control. The current pattern of `entry` and `end` events is hostile to this goal.

Using `examples/unzip.js` on http://wolfesoftware.com/crowded.zip.xz (unxz it first) results in over 2GB of virtual ram allocation as the program slows to a crawl while creating empty directories. This is failure. The problem is that we call `openReadStream` (https://github.com/thejoshwolfe/yauzl/commit/0bf7f487a0069b63df8d94a2c0af8a04a238dd4e#diff-5b07aa03e052f091324dde1dfbfd25bfR53) on every entry before pumping any of the file data. This is the naive solution to the problem that #18 ran into in the test system where `autoClose` can lead to race conditions.

Really, the problem is that `autoClose` expects you to call all your `openReadStream()`s before yauzl emits the `end` event, and yauzl races to emit the `end` event as soon as possible. Even with `autoClose` off, you need to keep all your `Entry` objects in ram as yauzl emits them, or you'll lose the information.

We need some way to throttle entry reading.

Also, it's questionable that yauzl's API encourages clients to extract all files in parallel. This would probably be worse performance on any file system that tries to pre-fetch data ahead of your `read()` calls (pretty much all of them, I think) than the normal strategy of extracting one file at a time from start to finish. I didn't test that, but it seems like in general file system access wouldn't necessarily benefit from parallelization; in fact, massive parallelization probably makes things much worse due to cache thrashing and the extra resources required to keep all the pipelines open.

It will be nice to remove `examples/unzip.js`'s dependency on a third-party synchronization library `pend`, which is currently being used to iterate over the entries one at a time despite yauzl's API giving us all the entries in parallel. It's a smell that our own example doesn't use the recommended execution order.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add lazyEntries option #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add lazyEntries option #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions