[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some residues are skipped by the PPBuilder() #4746

Open
DimaMolod opened this issue Jun 17, 2024 · 8 comments
Open

Some residues are skipped by the PPBuilder() #4746

DimaMolod opened this issue Jun 17, 2024 · 8 comments
Assignees

Comments

@DimaMolod
Copy link
DimaMolod commented Jun 17, 2024

Setup

I am reporting a problem with Biopython version 1.83, Python version 3.10.0, and operating
system Rocky Linux 8.9 (Green Obsidian) as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import Bio; print(Bio.__version__)


3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]
CPython
Linux-4.18.0-513.9.1.el8_9.x86_64-x86_64-with-glibc2.28
1.83

Expected behaviour

output:
...
820
52.78
821
40.0
822
35.76

823
40.09
...

Actual behaviour

Residue 822 is skipped:
output:
...
820
52.78
821
40.0
822
40.09

823
34.85
...

Steps to reproduce

use this file to reproduce ranked_3.pdb.txt

from Bio.PDB import PDBParser, PPBuilder
from Bio.PDB.Structure import Structure as BioStructure
file = "ranked_3.pdb"
structure = PDBParser().get_structure('rank3',file)
i = 0
polypeptides = PPBuilder().build_peptides(structure, aa_only=False)
for chn_i in polypeptides:
   for res_i in chn_i:
      i=i + 1
      print(i)
      print(next(res_i.get_atoms()).bfactor)
@peterjc
Copy link
Member
peterjc commented Jun 17, 2024

Is there anything odd about this residue, e.g. the C–N distance which the PBBuilderuses to find polypeptides.

@DimaMolod
Copy link
Author

the distance between CA and N seems to be around 1.5 A for residue 822
`ATOM 6742 N THR B 822 -8.531 -23.264 12.685 1.00 35.76 N

ATOM 6743 CA THR B 822 -9.413 -23.716 13.755 1.00 35.76 C

ATOM 6744 C THR B 822 -10.876 -23.606 13.333 1.00 35.76 C

ATOM 6745 CB THR B 822 -9.186 -22.906 15.045 1.00 35.76 C

ATOM 6746 O THR B 822 -11.761 -24.155 13.993 1.00 35.76 O

ATOM 6747 CG2 THR B 822 -7.945 -23.390 15.787 1.00 35.76 C

ATOM 6748 OG1 THR B 822 -9.020 -21.522 14.711 1.00 35.76 O `

@DimaMolod
Copy link
Author

This is AlphaFold2 prediction, so there could be something wrong with the distances actually

@JoaoRodrigues JoaoRodrigues self-assigned this Jun 18, 2024
@JoaoRodrigues
Copy link
Member

Hi @DimaMolod, thanks for reporting this. You can define a distance to the PPBuilder(), which by default is set to 1.8A. For predicted models, this might be a little off. Alternatively, you can use the more robust CaPPBuilder() that checks distances between Ca atoms at a distance of 3.8A, which can be more robust. If you can share the model, it would be great to help us debug!

@DimaMolod
Copy link
Author

Hi @JoaoRodrigues and many thanks for your suggestions and help!
Actually, I already uploaded the model as raked_3.pdb.txt in my first comment. Sorry, I changed the extension because pdb files are not allowed. Please let me know if you can download and test this model.
Many thanks again for your help,
Dima

@DimaMolod
Copy link
Author

p.s. This happens to AlphaFold2 models regularly, even though fairly seldom. Indeed the distance threshold might need to be slightly adjusted

@JoaoRodrigues
Copy link
Member

Ah sorry, I missed the attachment..

p.s. This happens to AlphaFold2 models regularly, even though fairly seldom. Indeed the distance threshold might need to be slightly adjusted

Yeah, roughly 1/3 of the CA-CA bonds in that model are stretched :) Even at 4.0A, you have some violations:

B:GLN3 - B:ASN2 = 4.010 *
B:LEU32 - B:LEU31 = 4.024 *
B:ASP33 - B:LEU32 = 4.090 *
B:GLY90 - B:ARG89 = 4.007 *
B:GLU150 - B:GLU149 = 4.024 *
B:LYS427 - B:GLN426 = 4.022 *
B:MET507 - B:ARG506 = 4.018 *
B:GLU508 - B:MET507 = 4.014 *
B:LEU510 - B:LYS509 = 4.188 *
B:THR513 - B:GLU512 = 4.010 *
B:ILE514 - B:THR513 = 4.271 *
B:TYR589 - B:SER588 = 4.054 *
B:GLY669 - B:THR668 = 4.003 *
B:THR819 - B:MET818 = 4.151 *
B:ARG820 - B:THR819 = 4.073 *
B:THR822 - B:GLU821 = 4.426 *
B:ALA823 - B:THR822 = 4.679 *
B:ASP824 - B:ALA823 = 4.331 *
B:THR825 - B:ASP824 = 4.180 *
B:ASP841 - B:SER840 = 4.117 *
B:CYS862 - B:TYR861 = 4.040 *
B:ARG865 - B:LYS864 = 4.070 *
B:ALA868 - B:PRO867 = 4.132 *
B:GLY877 - B:PRO876 = 4.082 *
C:MET1 - B:LEU894 = 34.938 *  -> This one is obviously a chain break
C:ASP5 - C:GLU4 = 4.012 *
C:GLU6 - C:ASP5 = 4.083 *
C:GLY48 - C:VAL47 = 4.060 *

@JoaoRodrigues
Copy link
Member

Whatever you are doing with these models, I'd recommend a very gentle restrained minimization before any analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants