[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Open
Gullumluvl opened this issue Jul 29, 2022 · 2 comments
Assignees

Comments

@Gullumluvl
Copy link

Hi, I am using Biopython 1.79.

Sorry, I admit this is a detail, but I reference it in case comment parsing is updated in the future. I could participate in that.

Expected behaviour

Comment should have an identical format between Phylo parser and Nexus parser. In particular, without the surrounding square brackets.

Actual behaviour

  1. Using Phylo.read(format="newick"): the square brackets are stripped.
  2. Using Nexus.Nexus() or Phylo.read(format="nexus"): the square brackets are kept.

In details (see below to reproduce):

Bio.Nexus.Nexus parser:            comment a = '[&varX=5]'
Bio.Phylo parser, format="nexus":  comment a = '[&varX=5]'
Bio.Phylo parser, format="newick": comment a = '&NHX:varX=5'
Bio.Phylo parser, format="newick", input line from nexus file: comment a = '&varX=5'

Steps to reproduce

from io import StringIO
from Bio import Phylo
from Bio.Nexus import Nexus


nexus_txt = """Begin trees;
    tree TREE1 = ((a[&varX=5]:1,b[&varX=6,varY=abc]:1):2,c:3):0.5;
End;
"""

NHX_txt = "((a[&NHX:varX=5]:1,b[&NHX:varX=6:varY=abc]:1):2,c:3):0.5;"


nx = Nexus.Nexus(StringIO(nexus_txt))
tree = nx.trees[0]
node_a = tree.node(tree.search_taxon('a'))
print('Bio.Nexus.Nexus parser:            comment a = %r' % node_a.data.comment)

tree = Phylo.read(StringIO(nexus_txt), 'nexus')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="nexus":  comment a = %r' % node_a.comment)

tree = Phylo.read(StringIO(NHX_txt), 'newick')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="newick": comment a = %r' % node_a.comment)

tree = Phylo.read(StringIO(nexus_txt.splitlines()[1].split()[-1]), 'newick')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="newick", input line from nexus file: comment a = %r' % node_a.comment)
@peterjc
Copy link
Member
peterjc commented Aug 9, 2022

It was confusing your example didn't use the same Newick tree, e.g.

NHX_txt = "((a[&NHX:varX=5]:1,b[&NHX:varX=6:varY=abc]:1):2,c:3):0.5;"
nexus_txt = f"""Begin trees;
    tree TREE1 = {NHX_txt}
End;
"""

That makes your point much more clearly:

Bio.Nexus.Nexus parser:            comment a = '[&NHX:varX=5]'
Bio.Phylo parser, format="nexus":  comment a = '[&NHX:varX=5]'
Bio.Phylo parser, format="newick": comment a = '&NHX:varX=5'
Bio.Phylo parser, format="newick", input line from nexus file: comment a = '&NHX:varX=5'

@Gullumluvl
Copy link
Author

Yes, this would show the parsing behavior more clearly.

The difference was there because comment formatting in NHX differs from comment formatting in Nexus trees. From what I've seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants