Some residues are skipped by the PPBuilder() #4746

DimaMolod · 2024-06-17T10:29:43Z

Setup

I am reporting a problem with Biopython version 1.83, Python version 3.10.0, and operating
system Rocky Linux 8.9 (Green Obsidian) as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import Bio; print(Bio.__version__)


3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]
CPython
Linux-4.18.0-513.9.1.el8_9.x86_64-x86_64-with-glibc2.28
1.83

Expected behaviour

output:
...
820
52.78
821
40.0
822
35.76
823
40.09
...

Actual behaviour

Residue 822 is skipped:
output:
...
820
52.78
821
40.0
822
40.09
823
34.85
...

Steps to reproduce

use this file to reproduce ranked_3.pdb.txt

from Bio.PDB import PDBParser, PPBuilder
from Bio.PDB.Structure import Structure as BioStructure
file = "ranked_3.pdb"
structure = PDBParser().get_structure('rank3',file)
i = 0
polypeptides = PPBuilder().build_peptides(structure, aa_only=False)
for chn_i in polypeptides:
   for res_i in chn_i:
      i=i + 1
      print(i)
      print(next(res_i.get_atoms()).bfactor)

peterjc · 2024-06-17T12:58:47Z

Is there anything odd about this residue, e.g. the C–N distance which the PBBuilderuses to find polypeptides.

DimaMolod · 2024-06-17T13:32:37Z

the distance between CA and N seems to be around 1.5 A for residue 822
`ATOM 6742 N THR B 822 -8.531 -23.264 12.685 1.00 35.76 N

ATOM 6743 CA THR B 822 -9.413 -23.716 13.755 1.00 35.76 C

ATOM 6744 C THR B 822 -10.876 -23.606 13.333 1.00 35.76 C

ATOM 6745 CB THR B 822 -9.186 -22.906 15.045 1.00 35.76 C

ATOM 6746 O THR B 822 -11.761 -24.155 13.993 1.00 35.76 O

ATOM 6747 CG2 THR B 822 -7.945 -23.390 15.787 1.00 35.76 C

ATOM 6748 OG1 THR B 822 -9.020 -21.522 14.711 1.00 35.76 O `

DimaMolod · 2024-06-17T13:33:38Z

This is AlphaFold2 prediction, so there could be something wrong with the distances actually

JoaoRodrigues · 2024-06-21T13:30:37Z

Hi @DimaMolod, thanks for reporting this. You can define a distance to the PPBuilder(), which by default is set to 1.8A. For predicted models, this might be a little off. Alternatively, you can use the more robust CaPPBuilder() that checks distances between Ca atoms at a distance of 3.8A, which can be more robust. If you can share the model, it would be great to help us debug!

DimaMolod · 2024-06-21T14:26:37Z

Hi @JoaoRodrigues and many thanks for your suggestions and help!
Actually, I already uploaded the model as raked_3.pdb.txt in my first comment. Sorry, I changed the extension because pdb files are not allowed. Please let me know if you can download and test this model.
Many thanks again for your help,
Dima

DimaMolod · 2024-06-21T14:30:35Z

p.s. This happens to AlphaFold2 models regularly, even though fairly seldom. Indeed the distance threshold might need to be slightly adjusted

JoaoRodrigues · 2024-06-21T16:53:10Z

Ah sorry, I missed the attachment..

p.s. This happens to AlphaFold2 models regularly, even though fairly seldom. Indeed the distance threshold might need to be slightly adjusted

Yeah, roughly 1/3 of the CA-CA bonds in that model are stretched :) Even at 4.0A, you have some violations:

B:GLN3 - B:ASN2 = 4.010 *
B:LEU32 - B:LEU31 = 4.024 *
B:ASP33 - B:LEU32 = 4.090 *
B:GLY90 - B:ARG89 = 4.007 *
B:GLU150 - B:GLU149 = 4.024 *
B:LYS427 - B:GLN426 = 4.022 *
B:MET507 - B:ARG506 = 4.018 *
B:GLU508 - B:MET507 = 4.014 *
B:LEU510 - B:LYS509 = 4.188 *
B:THR513 - B:GLU512 = 4.010 *
B:ILE514 - B:THR513 = 4.271 *
B:TYR589 - B:SER588 = 4.054 *
B:GLY669 - B:THR668 = 4.003 *
B:THR819 - B:MET818 = 4.151 *
B:ARG820 - B:THR819 = 4.073 *
B:THR822 - B:GLU821 = 4.426 *
B:ALA823 - B:THR822 = 4.679 *
B:ASP824 - B:ALA823 = 4.331 *
B:THR825 - B:ASP824 = 4.180 *
B:ASP841 - B:SER840 = 4.117 *
B:CYS862 - B:TYR861 = 4.040 *
B:ARG865 - B:LYS864 = 4.070 *
B:ALA868 - B:PRO867 = 4.132 *
B:GLY877 - B:PRO876 = 4.082 *
C:MET1 - B:LEU894 = 34.938 *  -> This one is obviously a chain break
C:ASP5 - C:GLU4 = 4.012 *
C:GLU6 - C:ASP5 = 4.083 *
C:GLY48 - C:VAL47 = 4.060 *

JoaoRodrigues · 2024-06-21T16:53:54Z

Whatever you are doing with these models, I'd recommend a very gentle restrained minimization before any analysis.

JoaoRodrigues self-assigned this Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some residues are skipped by the PPBuilder() #4746

Some residues are skipped by the PPBuilder() #4746

Some residues are skipped by the PPBuilder() #4746

Some residues are skipped by the PPBuilder() #4746

Comments

Setup

Expected behaviour

Actual behaviour

Steps to reproduce