search_msa database sequencing #110

shawncal · 2022-01-27T18:57:44Z

Sequencing the MSA search, first looking through UniRef (x4 times), then expanding to bfd (x4 times). Since these database files are very large, this improves the chance that we can keep the UniRef db in cache rather than loading (or steaming, if pulling from a network resource) 4 separate times.

For sequences that are well-represented in the UniRef, we may never need to load/search bfd, which will speed things up significantly.

Other, minor changes:

added logs to report number of MSA hits after each hhblits pass
added "set -e" for early exit on failure
completed the cov75 check (and potential return case) before beginning the hhfilter for cov50

BJWiley233 · 2023-01-27T22:35:39Z

Did you find structures were as good without searching BFD? AlphaFold does the same as before running against both databases at the same time, maybe can adjust and run similar as to Uniref first, then BFD. I guess they don't mind if they have way more than 2000 proteins at >=75% coverage??

Sequencing databases for better mem cache

81c65ef

minkbaek merged commit fcf9125 into RosettaCommons:main Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search_msa database sequencing #110

search_msa database sequencing #110

search_msa database sequencing #110

search_msa database sequencing #110

Conversation