Using mappy across multiple processes #125

marcus1487 · 2018-02-22T22:57:20Z

I am working on making the incorporation of mappy into tombo (for nanopore modified base detection) and I am having some issues using mappy on larger genomes across many cores. My main issue is that right now I am opening a new mappy.Aligner object in each python process (via multiprocess module). For larger genomes, this leaves a large memory footprint. I wanted to be safe about using this object across multiple processes, so I opened a new Aligner in each new python multiprocess, but I am wondering if there is an existing solution that might allow for the in memory minimap2 index to be shared across multiple python processes via the mappy API to help decrease this memory footprint.

The text was updated successfully, but these errors were encountered:

lh3 · 2018-02-23T00:25:07Z

This will be technically difficult. It is possible to access memory from different processes using shared memory, but implementing that in minimap2 will be nontrivial.

Can you use multiple threads? You can create one mappy.Aligner object and call Aligner.map on different threads like (I have not tested this part, though):

aligner = mappy.Aligner(fn_index)
# then in each thread
thr_buf = mappy.ThreadBuffer()
for hit in aligner.map(seq, buf=thr_buf):
    ...

marcus1487 · 2018-02-23T00:36:11Z

I figured this might be quite an undertaking if it were not already built into minimap2/mappy. I am using python's multiprocess module right now, but slimming the memory footprint down for large genomes might warrant a switch to threading.

I will try some tests to see if this is feasible. Thanks or the pointer and use case for mappy.ThreadBuffer.

marcus1487 · 2018-03-07T21:52:01Z

I have tested the interface and it works quite well indeed. I will note that mixing multiprocessing with multithreading was quite a headache, but it is indeed possible. For others that might be trying this, I had to open all multiprocess objects (Pipes in my case) and start all processes before opening any threading objects.

While it would be nice to have a version that could share state across processes I think this is a sufficient workaround and so I would consider this issue resolved.

lh3 added the feature-request label Feb 23, 2018

marcus1487 mentioned this issue Feb 27, 2018

Huge memory demand in resquiggle. nanoporetech/tombo#36

Closed

marcus1487 closed this as completed Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using mappy across multiple processes #125

Using mappy across multiple processes #125

Using mappy across multiple processes #125

Using mappy across multiple processes #125

Comments