khmer multiprocessing + seqan #655

camillescott · 2014-11-12T22:36:59Z

Multiprocessing for khmer.

Addresses #76 and #92; greatly extends khmer's multiprocessing capabilities.

See #638 for original PR.

See #656 for a pull against the seqan branch which masks those changes and improves readability.

See khmer-metrics for some performance profiling.

From original pull:

Example usage from Python can be found here: https://github.com/camillescott/khmer-metrics/blob/master/test_async_diginorm.py

@ctb @mr-c @luizirber thoughts on the Python interface are welcome. As of now, interaction with processed reads is mediated by an iterator over the output queue, which returns khmer::read_parsers::Read objects.

Design

The current design builds off the following assumptions:

Writing to the hashtable is quite fast from one thread.
Making hashtable writing threadsafe is not practical with bigcount (testing shows that the locking required negates any benefit), and will remain so until bigcount is replaced by larger bin sizes (or we decide to eschew bigcount when using multiprocessing)
The main bottleneck for most tasks is read processing, for example in diginorm, finding median counts, etc., which is generally threadsafe
Most of our use cases follow a similar structure of pull in reads, do stuff to them using the hashtable, spit them out (or not) to disk

With these constraints in mind, I have begun by focusing on streaming tasks and taking advantage of asynchronous IO and hashtable access. The basic building block is thus the Asyncabstract base class, which:

Takes a khmer::Hashtable instance on construction
Defines a lock-free input queue which is templated, usually understood to take HashIntoType, Read, or const char *.
Declares (but leaves undefined) a consume method, which is expected to be threadsafe and pull from the input queue
Defines a start(int n_threads) method which launches the specified number of threads running consume
Defines a stop() method which stops the running consume threads
Defines a number of bookkeeping getters, setters, and boolean statuses for managing thread state

All the actual async implementations build off this class. For example, the AsyncSequenceWriter inherits from Async<const char *>, and its consume method breaks down the input sequences into k-mers and writes them to the given hashtable.

The AsyncSequenceProcessor is another abstract class which builds off Async<Read>, adding an output queue and an additional reader thread; the reader thread parses reads from a file given to start, which are asynchronously pushed to the input queue. The consume threads still pull off this queue, and are expected to push their results to the output queue. It also declares a stop_iter method, which returns false when the conditions indicate that all parsing and processing is complete and is used for the python interface.

AsyncDiginorm (and any other future read processors, say, abundfilt) inherits from AsyncSequenceProcessor. Its consume method implements digital normalization with a cutoff value given to the start method.

Python Interface

As expected, the various processors are exposed as Python objects. For now, only AsyncDiginorm is fully wrapped, though AsyncSequenceProcessor is partially wrapped. Their new methods pull the pointer to a Hashtable object from the object's python wrapper and pass it to the constructor. The progress getters, start, and stop methods are exposed. A user creates a counting hash, then an AsyncDiginorm object, and passes that table in. Then, they call start with the desired cutoff, filename, and number of threads, which launches the parser thread and consume threads, which run asynchronously. The final piece is output, which is the reason for AsyncSequenceProcessor to be exposed; it defines an iterator over the output queue, which calls iter_stop to determine status. Maintaining the class hierarchy in Python-land not adds structure, but also avoids needing to redefine this iterator for every processor class.

Boost

This implementation uses boost::lockfree::queue.hpp. This is a non-blocking, lock-free, multi-producer multi-consumer queue. Queues are a possible huge bottleneck, and these lockfree queues are considerably faster in this case than a trivial locked queue. Their implementation means that they have a max size of 65535; this doesn't really matter, as I have limited the max queue length to 50000 as is. This is to avoid the read parser getting ahead of the processor threads and filling up main memory. The parser thread simply spins until it can push to the queue again.

There is some debate to be had as to whether boost is a good solution, but for now I'd rather spend time working on khmer's internals and not reinventing the data structure wheel. At the request of @mr-c, I have package a subset of boost in third-party. Conveniently, the boost devs provide a tool called bcp for just that. The command I ran to extract the relevant files is:

bcp boost/lockfree/queue.hpp --namespace=pkgboost --boost=/usr/include/ /w/lockfree

The --namespace option renames all the boost namespaces. I have done this to make sure users are linking to our version of boost and not a local version they have installed.

This once again adds a pile of new files, but I think avoiding the hassle of implementing the queue ourselves is worthwhile. Note that going this direction also opens up the option of using boost to tackle the streaming problem, but that's for @mr-c, @ctb, and @bocajnotnef to figure out :)

NOTE: I rolled back this change for now because it once again made it impossible to review. However, this is the process we can/should use for final merge.

Further Considerations

An important consideration is that the asynchronous nature of this method means results are not replicable. In particular, digital normalization on smaller file sizes can run into considerable variability (+/- 20000 reads kept on a 1m read input), because of the async hash writer thread. However, as the hashtable becomes more saturated, the writer thread "catches up" to the processor threads, and the results (should) converge toward what one would expect from a normal, serial run. Curiously, this also means that the program runs faster the longer it runs, approaching the IO speed (given number of threads, disk speed, etc), because processor threads are no longer waiting for the writer thread to write out the reads in its queue.

Project TODO

Merge Checklist

Is it mergable?
Did it pass the tests?
If it introduces new functionality in scripts/ is it tested?
Check for code coverage.
Is it well formatted? Look at make pep8, make diff_pylint_report,
make cppcheck, and make doc output. Use make format and manual fixing as needed.
Is it documented in the ChangeLog?
Was a spellchecker run on the source code and documentation after
changes were made?

…ume_string

Bring in fixes to threading in do-partition.py

This reverts commit ac399a5.

… in async diginorm script

camillescott · 2014-12-16T21:28:26Z

Re: my previous comment about that serious memory leak, it has been resolved in #692.

mr-c · 2015-01-26T19:57:20Z

@camillescott to make another branch with the Boost bits that Jenkins can build
@mr-c to test that branch on the BaTLab systems
@camillescott to peel off chunks of that new branch into digestible pull requests for merging

Uses boost's included bcp utility to extract relevant files. Command used was: `bcp boost/lockfree/queue.hpp --namespace=pkgboost --boost=/usr/include/ third-party/boost` This places the packaged boost into its own namespace, `pkgboost`, to avoid collisions with existing boost installations.

…for all async module

… issue, add explicit check for is_threadsafe, add more explicit state management to async_sequence_processor

ctb · 2015-06-12T19:41:51Z

ping @camillescott

camillescott and others added 30 commits October 21, 2014 05:06

Initial naive openmp test

9ad1353

Add extra_link_args for -lgomp

5638bb1

Add spinlock to CountingHash::count, enable openmp in Hashtable::cons…

1e0274c

…ume_string

Add _parallel methods

6cd58a7

Remove lock for testing purposes

5746aa5

Thread output

ecd657c

first pass seqan impl

0be7209

add the seqan headers

4eb7247

point to the seqan headers

50fd998

Add new test, new data

954a8b0

remove impl specific assert

237ad2c

Evidently NoMoreReadsAvailable was never thrown before?

fa33b81

update tests for early failure mode

83436ca

restore these files; remove later

1b51eeb

remove unneeded library link

e338ee1

Expose consume_async to python land

8220c08

Add consume_async to counting

5e4f198

Add consume_async to Hashtable

b6be309

Add async_hash and threadsafe queue

d138b79

Add async flag to normalize by median

63c760a

proper build deps for new objects

ced8ccc

Add test for async consume

cb14c0e

Merge branch 'master' into feature/threading

bbd33c9

Bring in fixes to threading in do-partition.py

cleanup includes

ac399a5

Revert "cleanup includes"

d70443c

This reverts commit ac399a5.

remove Hasher and other unused code

63428ce

clean up makefile, PEP8

99e41f5

omit seqan from coverage report

e262b2d

bye bye thread_id_map

6f07fd4

goodbye khmer_config

b12089c

Turn off timing, begin round robin model

e418084

ctb mentioned this pull request Dec 14, 2014

first pass seqan impl #642

Merged

ctb changed the title ~~khmer multprocessing + seqan~~ khmer multiprocessing + seqan Dec 14, 2014

camillescott added 4 commits December 15, 2014 01:06

Additional work on RR producer

6837e0b

Comment out RR model

9878215

Merge in master

83d2bca

Fix memory leak in read iterator; fix output for number and perc kept…

0ad7a35

… in async diginorm script

ctb mentioned this pull request Jan 18, 2015

User experience upgrades for khmer #732

Closed

camillescott and others added 8 commits January 26, 2015 16:57

Add output for acquiring stdout lock when in verbose mode

71c926d

Add header for async parser

e9bdd82

Add async_parser object; begin transitition to state-based operation …

b7f449f

…for all async module

Finish transition to state based system; fix hangup bug

58c5acf

Major cleanup: remove async_writers, fix some tests, fix one deadlock…

661dc30

… issue, add explicit check for is_threadsafe, add more explicit state management to async_sequence_processor

Switch ordering of some exception handling

4697c76

Add exception propagation from async_parser

a16709b

mr-c modified the milestones: 1.4, 1.4+ May 13, 2015

ctb mentioned this pull request Jun 9, 2015

Refactor normalize-by-median with extreme prejudice #1006

Closed

ctb modified the milestones: 2.0, 1.4+ Jun 12, 2015

ctb mentioned this pull request Jun 12, 2015

Encapsulate common read threading code in an API #69

Closed

ctb mentioned this pull request Jun 23, 2015

Get writeup of threaded benchmarking results from diginorm dib-lab/dib-lab#20

Open

ctb removed this from the 2.0 milestone Jul 22, 2015

mr-c added this to the unscheduled milestone Sep 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

khmer multiprocessing + seqan #655

khmer multiprocessing + seqan #655

khmer multiprocessing + seqan #655

Are you sure you want to change the base?

khmer multiprocessing + seqan #655

Conversation

Multiprocessing for khmer.

From original pull:

Design

Python Interface

Boost

Further Considerations

Project TODO

Merge Checklist