[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_msa database sequencing #110

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

shawncal
Copy link
Contributor

Sequencing the MSA search, first looking through UniRef (x4 times), then expanding to bfd (x4 times). Since these database files are very large, this improves the chance that we can keep the UniRef db in cache rather than loading (or steaming, if pulling from a network resource) 4 separate times.

For sequences that are well-represented in the UniRef, we may never need to load/search bfd, which will speed things up significantly.

Other, minor changes:

  • added logs to report number of MSA hits after each hhblits pass
  • added "set -e" for early exit on failure
  • completed the cov75 check (and potential return case) before beginning the hhfilter for cov50

@minkbaek minkbaek merged commit fcf9125 into RosettaCommons:main Jan 27, 2022
@BJWiley233
Copy link
BJWiley233 commented Jan 27, 2023

Did you find structures were as good without searching BFD? AlphaFold does the same as before running against both databases at the same time, maybe can adjust and run similar as to Uniref first, then BFD. I guess they don't mind if they have way more than 2000 proteins at >=75% coverage??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants