Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Gullumluvl · 2022-07-29T16:47:28Z

Hi, I am using Biopython 1.79.

Sorry, I admit this is a detail, but I reference it in case comment parsing is updated in the future. I could participate in that.

Expected behaviour

Comment should have an identical format between Phylo parser and Nexus parser. In particular, without the surrounding square brackets.

Actual behaviour

Using Phylo.read(format="newick"): the square brackets are stripped.
Using Nexus.Nexus() or Phylo.read(format="nexus"): the square brackets are kept.

In details (see below to reproduce):

Bio.Nexus.Nexus parser:            comment a = '[&varX=5]'
Bio.Phylo parser, format="nexus":  comment a = '[&varX=5]'
Bio.Phylo parser, format="newick": comment a = '&NHX:varX=5'
Bio.Phylo parser, format="newick", input line from nexus file: comment a = '&varX=5'

Steps to reproduce

from io import StringIO
from Bio import Phylo
from Bio.Nexus import Nexus


nexus_txt = """Begin trees;
    tree TREE1 = ((a[&varX=5]:1,b[&varX=6,varY=abc]:1):2,c:3):0.5;
End;
"""

NHX_txt = "((a[&NHX:varX=5]:1,b[&NHX:varX=6:varY=abc]:1):2,c:3):0.5;"


nx = Nexus.Nexus(StringIO(nexus_txt))
tree = nx.trees[0]
node_a = tree.node(tree.search_taxon('a'))
print('Bio.Nexus.Nexus parser:            comment a = %r' % node_a.data.comment)

tree = Phylo.read(StringIO(nexus_txt), 'nexus')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="nexus":  comment a = %r' % node_a.comment)

tree = Phylo.read(StringIO(NHX_txt), 'newick')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="newick": comment a = %r' % node_a.comment)

tree = Phylo.read(StringIO(nexus_txt.splitlines()[1].split()[-1]), 'newick')
node_a = tree.find_any(name='a')
print('Bio.Phylo parser, format="newick", input line from nexus file: comment a = %r' % node_a.comment)

The text was updated successfully, but these errors were encountered:

peterjc · 2022-08-09T10:58:08Z

It was confusing your example didn't use the same Newick tree, e.g.

NHX_txt = "((a[&NHX:varX=5]:1,b[&NHX:varX=6:varY=abc]:1):2,c:3):0.5;"
nexus_txt = f"""Begin trees;
    tree TREE1 = {NHX_txt}
End;
"""

That makes your point much more clearly:

Bio.Nexus.Nexus parser:            comment a = '[&NHX:varX=5]'
Bio.Phylo parser, format="nexus":  comment a = '[&NHX:varX=5]'
Bio.Phylo parser, format="newick": comment a = '&NHX:varX=5'
Bio.Phylo parser, format="newick", input line from nexus file: comment a = '&NHX:varX=5'

Gullumluvl · 2022-08-09T12:59:43Z

Yes, this would show the parsing behavior more clearly.

The difference was there because comment formatting in NHX differs from comment formatting in Nexus trees. From what I've seen.

peterjc assigned etal Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Inconsistency in keeping square brackets from node comments from phylogenetic trees (Nexus VS Phylo) #4009

Comments

Expected behaviour

Actual behaviour

Steps to reproduce