You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came across the following issue when using Biopython to parse genbank files produced by the "pharokka" program. My parsing script breaks with a ValueError parsing this particular record.
This is an example of a problematic entry. The specific problem is with :
/resistance-nodulation-cell division (RND) antibiotic efflux pump="true"
The script breaks because the scanner.py cannot handle qualifier keys that wrap to the next line, like it can handle values.
After a lot of trial and error with debugging my own script, my colleague and I changed the scanner.py script between lines [332-336] to add this code:
if i == -1:
# Qualifier with no key, e.g. /pseudo
key = line[1:]
a = next(iterator)
if a[0] != "/":
j=a.find('=')
key = key.strip()
key+=" " +a[:j]
print(key)
qualifiers.append((key,a[j+1:]))
else:
qualifiers.append((key, None))
This effectively solved our issue and we wanted to suggest it or something similar as an improvement to the scanner.py module.
Thanks!
Varada
The text was updated successfully, but these errors were encountered:
My gut reaction is the pharokka tool is pushing the GenBank format too far, not just trying to wrap the qualifier keys, but using spaces and brackets in them too. Have you got in touch with them about this issue?
I do not think this should be supported and parsed as if it were valid by Biopython. I'm more open to a warning message, ignoring some lines, and continuing as best we can (either at the next qualifier, or the next feature).
See #4703 which will add warnings when trying to write invalid annotations like this. We might consider adding warnings on parsing as well (where the parser currently accepts things like over-long qualifier keys).
Thanks for your reply! and thanks for adding in the warnings with the genbank file creation. I can certainly understand that Biopython should not accept genbank files with the long keys but a warning + "move to next qualifier" would be an amazing compromise! My biggest issue was that my entire loop broke at one qualifier in a random file because I had 200000 records x 3000 gbk files to parse. I would have preferred for the record to be thrown away but this won't be the case for all users and their gbk files.
I will definitely bring this up with the pharokka developers!
Hello,
I came across the following issue when using Biopython to parse genbank files produced by the "pharokka" program. My parsing script breaks with a ValueError parsing this particular record.
This is an example of a problematic entry. The specific problem is with :
/resistance-nodulation-cell division (RND) antibiotic efflux pump="true"
The script breaks because the scanner.py cannot handle qualifier keys that wrap to the next line, like it can handle values.
After a lot of trial and error with debugging my own script, my colleague and I changed the scanner.py script between lines [332-336] to add this code:
This effectively solved our issue and we wanted to suggest it or something similar as an improvement to the scanner.py module.
Thanks!
Varada
The text was updated successfully, but these errors were encountered: