[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_support function does not display branch supports for all #1887

Open
SidraYounas opened this issue Jan 3, 2019 · 24 comments
Open

get_support function does not display branch supports for all #1887

SidraYounas opened this issue Jan 3, 2019 · 24 comments
Assignees

Comments

@SidraYounas
Copy link
SidraYounas commented Jan 3, 2019

Setup

I am reporting a problem with Biopython version, Python version, and operating
system as follows:

Python implementation : CPython
Python platform: Windows-8-6.2.9200-SP0
Biopython version: 1.72

Expected behavior

I have used the get_support() function from Biopython.Phylo module to generate the support values for my original tree branches as follows:

msa = AlignIO.read("example.phy", "phylip")
calculator = DistanceCalculator()
constructor = DistanceTreeConstructor(calculator)
trees = bootstrap_trees(msa, 1000, constructor)
tree = []
tree = list(trees)
support_tree = get_support(example_tree, tree)  #example_tree is the tree generated from MSA data
                                                                               #tree is the list of replicate trees from bootstrap

Phylo.draw(support_tree)                        #The resulted tree is drawn in a new window which 
                                                                 #must show the branch support values for every branch point

Example sequence file is:
example.pdf
##The required result tree should be like: (i.e, it shows branch support values for every node point.)
exp_1

The behavior of module being faced

While the Phylo.draw(support_tree) is executed, the resulted tree shows values on a few of the branches and not for all of the branching points, (as the tree image shown)
The resulting tree is:
exp_1

Can you suggest any solution for this?

one more thing: If I generate a bootstrap consensus tree for the same MSA data, as follows:

msa = AlignIO.read("example.phy", "phylip")

calculator = DistanceCalculator()
constructor = DistanceTreeConstructor(calculator)
trees = bootstrap_trees(msa, 50, constructor)
tree = []
tree = list(trees)
target_tree = tree[0]
support_tree = get_support(target_tree, tree)
Phylo.draw(support_tree)

it does show the bootstrap support values for all of the branchng points as:
figure_1-1

and none of the value is in floating number rather integers while they are shown as floating numbers for my tree.

@SidraYounas
Copy link
Author
SidraYounas commented Jan 6, 2019

I have edited my issue to add images to explain the issue more clearly.
Are there any suggestions for the issue yet? @etal

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

Where is "example.phy from? Can you share the file as without it we can't repeat your example.

@SidraYounas
Copy link
Author
SidraYounas commented Jan 7, 2019

Yeah sure, I can provide the sequence text file in FASTA format. The "example.phy" is the MSA file in "phylip" format for the data using clustalw.
Here is the sequence file;
example.txt

and here is the "example.phy" file in simple text format which can be used for testing; (since ".phy" file was not supported so earlier I had to upload it in pdf form, but now I have uploaded it in text file format but now the text file will have to be with ".phy" extension to be tested with script.)
example.txt

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

Thanks.

Have you looked to see if all the bootstrap values are there if if saved to a tree file (e.g. Newick format, or PhyloXML)? Either look at the files, or try opening them in a tool like FigTree. This should tell you if the problem is in the data, or the drawing.

@SidraYounas
Copy link
Author
SidraYounas commented Jan 7, 2019

Alright, I am trying to check it this way. I will tell you once the results are generated.
Meanwhile , I want to discuss that
there is one more issue with this Phylo.draw function, if the data is too large, then the drawn tree looks disaster.
As like this;
figure_1

The size of the window remains fixed no matter how larger the data becomes for display. Are'nt there any way for making the window size scrollable according to data file size , so that it gives a good display?

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

I think you would need to look at enhancing the Bio.Phylo code if you wanted more flexibility.

I practice I would use the Biopython code for large scale automation (e.g. generating hundreds of trees and images), but when I wanted to prepare a single image for publication I would like use a dedicated package like http://tree.bio.ed.ac.uk/software/figtree/ (which can help with things like colour coding).

@SidraYounas
Copy link
Author

Can't the axes sizes be made flexible for "draw()" function.?

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

Yes in principle, although it is less clear how best to extend the API for flexibility and ease of use (more and more optional arguments for things like image size and font size could become unwieldy). If you worked on this you'd be in a position to suggest something, but it would down to @etal to review it.

@SidraYounas
Copy link
Author

The things is I can't do like, first generating a tree and then go to some tool to display it. This is the reason I wanted your help to suggest me how can I enhance the window axes for display so that it becomes scrollable if the file data size is larger.

@SidraYounas
Copy link
Author

One thing that I do not understand is that, when I use the first tree from bootstrap replicates to be compared with the rest of the replicates , the resulted tree shows a high confidence values for all of the branches, even when I used just 50 bootstrap value.
Then how does it not give the bootstrap support for the same inner nodes as they are present in the original tree.?

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

Your examples are still incomplete, you are missing the imports - this is enough for the second example to run:

from Bio import AlignIO
from Bio import Phylo
from Bio.Phylo.Consensus import bootstrap_trees, get_support
from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor

The first example does not define example_tree so fails.

@SidraYounas
Copy link
Author

No , see my imports.

from Bio import Phylo
from Bio.Phylo.TreeConstruction import *
from Bio.Phylo.Consensus import *
from Bio.Phylo import Newick
from Bio import AlignIO
from matplotlib import pyplot as plt

If I have'nt been using them, I could not have generated the tree files. I am really sorry that I forgot to mention.

@SidraYounas
Copy link
Author
SidraYounas commented Jan 7, 2019

Here is the complete code;

from Bio import Phylo
from Bio.Phylo.TreeConstruction import *
from Bio.Phylo.Consensus import *
from Bio.Phylo import Newick
from Bio import AlignIO
from matplotlib import pyplot as plt

aln = AlignIO.read(open('example.phy'), 'phylip')

calculator = DistanceCalculator()
dm = calculator.get_distance(aln)
constructor = DistanceTreeConstructor()

njtree = constructor.nj(dm)
starting_tree = njtree
scorer = ParsimonyScorer()
searcher = NNITreeSearcher(scorer)
constructor = ParsimonyTreeConstructor(searcher, starting_tree)

example_tree = constructor.build_tree(aln)

#Bootstrap
calculator = DistanceCalculator()
constructor = DistanceTreeConstructor(calculator)
trees = bootstrap_trees(aln, 1200, constructor)
tree = list (trees)
support_tree = get_support(example_tree, tree)
Phylo.draw(support_tree)

This is how an MSA data file is being ued to generate a tree and then bootstrap replicates for this data are produced. Then using "get_support()" function the original tree is compared for branch support values with bootstrap replicates.

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

Adding this line to your complete example is instructive:

Phylo.write(support_tree, "example.bootstrap1200.xml", "phyloxml")

Quoting in full,

<phyloxml xmlns="http://www.phyloxml.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd">
  <phylogeny rooted="true">
    <clade>
      <branch_length>0</branch_length>
      <confidence type="unknown">100.0</confidence>
      <clade>
        <name>Amphioxus</name>
        <branch_length>0.606194690265</branch_length>
      </clade>
      <clade>
        <name>Inner2</name>
        <branch_length>0.163337565699</branch_length>
        <clade>
          <name>Inner5</name>
          <branch_length>0.0357208702065</branch_length>
          <clade>
            <name>Inner7</name>
            <branch_length>0.0603068153884</branch_length>
            <clade>
              <name>Inner8</name>
              <branch_length>0.0112301806785</branch_length>
              <clade>
                <name>Inner4</name>
                <branch_length>0.0579106089338</branch_length>
                <confidence type="unknown">100.0</confidence>
                <clade>
                  <name>Chicken</name>
                  <branch_length>0.0592223861029</branch_length>
                </clade>
                <clade>
                  <name>Zebra</name>
                  <branch_length>0.0882702392658</branch_length>
                </clade>
              </clade>
              <clade>
                <name>Inner6</name>
                <branch_length>0.0260808443953</branch_length>
                <clade>
                  <name>Inner10</name>
                  <branch_length>0.0481348328417</branch_length>
                  <confidence type="unknown">0.166666666667</confidence>
                  <clade>
                    <name>Inner12</name>
                    <branch_length>0.0106845731932</branch_length>
                    <confidence type="unknown">0.5</confidence>
                    <clade>
                      <name>Inner9</name>
                      <branch_length>0.0152994791667</branch_length>
                      <confidence type="unknown">95.75</confidence>
                      <clade>
                        <name>Dog</name>
                        <branch_length>0.0476585545723</branch_length>
                      </clade>
                      <clade>
                        <name>Cow</name>
                        <branch_length>0.0585361356932</branch_length>
                      </clade>
                    </clade>
                    <clade>
                      <name>Inner11</name>
                      <branch_length>0.00483095962389</branch_length>
                      <confidence type="unknown">1.75</confidence>
                      <clade>
                        <name>Inner3</name>
                        <branch_length>0.0761459485619</branch_length>
                        <confidence type="unknown">100.0</confidence>
                        <clade>
                          <name>Human</name>
                          <branch_length>0.00359513274336</branch_length>
                        </clade>
                        <clade>
                          <name>Chimpanzee</name>
                          <branch_length>0.00525442477876</branch_length>
                        </clade>
                      </clade>
                      <clade>
                        <name>Mouse</name>
                        <branch_length>0.119096454031</branch_length>
                      </clade>
                    </clade>
                  </clade>
                  <clade>
                    <name>Elephant</name>
                    <branch_length>0.0678091583702</branch_length>
                  </clade>
                </clade>
                <clade>
                  <name>Platypus</name>
                  <branch_length>0.128667562158</branch_length>
                </clade>
              </clade>
            </clade>
            <clade>
              <name>Anole</name>
              <branch_length>0.154110588741</branch_length>
            </clade>
          </clade>
          <clade>
            <name>Xenopus</name>
            <branch_length>0.202110988201</branch_length>
          </clade>
        </clade>
        <clade>
          <name>Inner1</name>
          <branch_length>0.0294985250737</branch_length>
          <confidence type="unknown">99.25</confidence>
          <clade>
            <name>Fugu</name>
            <branch_length>0.213864306785</branch_length>
          </clade>
          <clade>
            <name>Zebrafish</name>
            <branch_length>0.213864306785</branch_length>
          </clade>
        </clade>
      </clade>
    </clade>
  </phylogeny>
</phyloxml>

Some but not all of the inner nodes (which have been automatically named) do seem to have confidences, such as Inner4 or Inner10, but look at Inner2 and Inner5 for example - they do not.

The question becomes are these bootstrap values really missing, or are they really branches with zero boot strap support?

Given the way Python evaluates numbers when cast as a boolean, a bootstrap value of zero is a special case and that might be why it was not reported in the XML output, or the drawing:

https://github.com/biopython/biopython/blob/biopython-173/Bio/Phylo/_utils.py#L370

@SidraYounas
Copy link
Author
SidraYounas commented Jan 7, 2019

So what does it conclude? The particular branching points have 0 confidence ? And if it does so, how do I change it so that it may show the value whether zero?
My question is :
For the same MSA file , when bootstrap replicates are generated and one of the trees is used as reference to be compared with rest of the bootstrap replicates to obtain branch support values, for the similar branching topology this tree displays the support value while the tree in the previous example does not do so.

from Bio import AlignIO
from Bio import Phylo
from Bio.Phylo.TreeConstruction import *
from Bio.Phylo import Consensus
from Bio.Phylo.Consensus import *

msa = AlignIO.read("example.phy", "phylip")

calculator = DistanceCalculator()
constructor = DistanceTreeConstructor(calculator)

trees = bootstrap_trees(msa, 1000, constructor)
tree = []
tree = list(trees)
target_tree = tree[0]

support_tree = get_support(target_tree, tree)
Phylo.draw(support_tree)

Do you get my point of confusion?

@peterjc
Copy link
Member
peterjc commented Jan 7, 2019

My guess right now is the missing bootstrap values are simply zero.

Do you agree that the same values are missing in the figures as in the data files? For example, look at the saved trees in a text editor or graphical tree tool. If so, that is good progress - it strongly suggests the problem is not in the tree drawing.

If you do not want to delve into debugging Biopython, and need to resolve this urgently, try doing the bootstrap analysis in another set of tools - for example PHYLIP. It used to be good advice to generate phylogenetic trees by at least two different methods anyway, to confirm they give reasonably consistent results, and so provide reassurance that your analysis is not based on an artefact of a particular tree building tool.

@SidraYounas
Copy link
Author

Alright, I might test my trees again.

@SidraYounas
Copy link
Author

Ain't there any way for displaying '0' for the branch support where it is zero rather showing nothing?

@peterjc
Copy link
Member
peterjc commented Jan 8, 2019

If you want to experiment with the non-display of zero bootstraps, try changing this line:

https://github.com/biopython/biopython/blob/biopython-173/Bio/Phylo/_utils.py#L370

Currently:

                if clade.confidence:
                    return conf2str(clade.confidence)

This might work better by showing zero:

                if clade.confidence is not None:
                    return conf2str(clade.confidence)

I am guessing that None is also a possible value if the is no bootstrap.

@etal
Copy link
Contributor
etal commented Jan 9, 2019

I think is not None should be added to the code there. Mind if I add that without a unit test?

@SidraYounas
Copy link
Author

Yeah okay, I 'll give it a try now.

@peterjc
Copy link
Member
peterjc commented Jan 9, 2019

@etal I would expect the line of code to be exercised by the existing tests, and with graphical output we don't currently have anyway of confirming the output except inspection by eye - so sure, if it works in a hands on test, go ahead and commit it.

Is there potentially something similar happening with the PhyloXML (and other format) output?

@SidraYounas
Copy link
Author
SidraYounas commented Jan 11, 2019

is not None does not work. The result remains the same. None of the branches shows zero value.
Have you tested the files yet?

peterjc added a commit that referenced this issue Jan 14, 2019
Ommission found while exploring #1887, fix agreed
by Eric Talevich.
@peterjc
Copy link
Member
peterjc commented Jan 14, 2019

I tested adding is not None in the draw function with a modified version of Tests/Nexus/bats.nex and the following:

from Bio import Phylo;
t = Phylo.read("bats2.nex", "nexus")
Phylo.draw(t)

(I did a quick search and replace of )0.23: and )0.50: with )0.00: for this)

Given @etal's comment earlier, I went ahead and committed that fix in 5ba7092 to label zero value bootstraps in Bio.Phylo.draw().

However, this alone does not solve @SidraYounas's problem. Someone needs to dig into the data structure to understand why these interior nodes don't seem to have a bootstrap value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants