-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download REBASE files on install or use #972
Comments
Currently the REBASE update has been done periodically on an ad hoc basis, largely because there is no active maintainer of the restriction code. While it generally has worked fine, I recall it needing some manual intervention where a new enzyme has something strange (e.g. a previously unused special character appears in the name etc). It looks like it has been updated twice since the original contribution: Doing an update now would be a good test of how well documented I have left this, and a useful contribution to Biopython in itself. Long term, I wondered about fetching the REBASE files semi-automatically (as part of a module re-write) a bit like the NCBI DTD file downloads for the Entrez parser. |
I have updated it recently and the update went into 1.67. Also, I made some changes to # Used REBASE emboss files version 605 (2016). In the Restriction cookbook it is written that the file will be updated with every new Biopython version, but this is obviously not the case. Actually, I wonder how often this is really neccessary: While the number of known restriction enzymes has largely increased (I think I recall that there are more than 100 enzymes new in |
As far as I know, updating REBASE was never written down as part of the Biopython release process - but that isn't a bad idea: http://biopython.org/wiki/Building_a_release Apologies @MarkusPiotrowski - I missed 8852490 while looking over the log. |
I guess it was the intention of the original author of the Actually, I don't understand what is exactly the problem with the REBASE files. As I understood, the guys from REBASE are quite happy with other packages distributing their files, as long as this is mentioned and their latest paper cited. Also, they claim that they would provide files in other formats if they were asked (so we could ask them to provide Biopython specific files, if we want to). |
The EMBOSS format data files we use start:
However http://rebase.neb.com/rebase/rebhelp.html currently says:
I wonder if the wording changed? Sadly the site's |
@peterjc @MarkusPiotrowski Thanks for all the input! Regarding the usage of the files and the conversion to a different format (another, future, issue). I was thinking that loading the data into a simple SQLite db would be nice. This would make it easy to query on many different features/properties when looking for an enzyme. Then when choosing an enzyme load data for that enzyme into a dict. |
SQLite might be overkill - fresh eyes on #268 would be good, as I have failed to set aside the time to look at it. |
At the moment the REBASE files are not distributed (and they are also not in the repository). |
From @peterjc "Avoid the legal grey area about the REBASE files by downloading them either at install time or usage." see New restriction analysis library, v2 #268
Proposal:
$python setup.py build --rebase_update
The text was updated successfully, but these errors were encountered: