使用BioPython读取整个.pdb文件目录_Python_Biopython_Pdb

使用BioPython读取整个.pdb文件目录

python

使用BioPython读取整个.pdb文件目录,python,biopython,pdb,Python,Biopython,Pdb,我最近受命用python编写一个程序，从a.pdb（蛋白质数据库）中找到蛋白质中每种金属2埃范围内的原子。这是我为它写的剧本 from Bio.PDB import * parser = PDBParser(PERMISSIVE=True) def print_coordinates(list): neighborList = list for y in neighborList: print " ", y.get_coord() structure_

我最近受命用python编写一个程序，从a.pdb（蛋白质数据库）中找到蛋白质中每种金属2埃范围内的原子。这是我为它写的剧本

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(list):
    neighborList = list
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)

atomList = Selection.unfold_entities(structure, 'A')

ns = NeighborSearch(atomList)

for x in structure.get_atoms():
    if x.name == 'ZN' or x.name == 'FE' or x.name == 'CU' or x.name == 'MG' or x.name == 'CA' or x.name == 'MN':
        center = x.get_coord()
        neighbors = ns.search(center,2.0)
        neighborList = Selection.unfold_entities(neighbors, 'A')

        print x.get_id(), ': ', neighborList
        print_coordinates(neighborList)
    else:
        continue

但这只是一个.pdb文件，我希望能够读取它们的整个目录。因为到目前为止我只使用Java，所以我不完全确定如何在Python2.7中做到这一点。我的一个想法是，我将把脚本放在一个try-catch语句中，并在其中进行一个while循环，然后在它结束时抛出一个异常，但这是我在Java中应该做的，我不确定在Python中该如何做。因此，我很想听听任何人可能有的想法或示例代码。

您的代码中有一些冗余，例如，这也是一样的：

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)
metals = ['ZN', 'FE', 'CU', 'MG', 'CA', 'MN']

atomList = [atom for atom in structure.get_atoms() if atom.name in metals]
ns = NeighborSearch(Selection.unfold_entities(structure, 'A'))

for atom in atomList:
    neighbors = ns.search(atom.coord, 2)
    print("{0}: {1}").format(atom.name, neighbors)
    print_coordinates(neighborList)

要回答您的问题，您可以使用

glob

模块获取所有pdb文件的列表，并将代码嵌套在迭代所有文件的

循环中。假设您的pdb文件位于/home/pdb_files/
：
from Bio.PDB import *
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/home/pdb_files/*')

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

for fileName in pdb_files:
     structure_id = fileName.rsplit('/', 1)[1][:-4]
     structure = parser.get_structure(structure_id, fileName)
     # The rest of your code

您可能想看看os
模块，尤其是作为旁白，您还可以用if x.name in（'ZN'，'FE'，'CU'，'MG'，'CA'，'MN'）替换if x.name in（'ZN'，'FE'，'CU'，'MG'，'CA'，'MN'）
@asongtoruin感谢您的建议，我将研究您提到的模块。另一个小提示：命名变量（列表
）喜欢一个类型在某个时候会引起麻烦。只需使用neightrlist
作为函数参数，那么你也可以跳过函数的第一行。@MaximilianPeters是的，这只是一种习惯，主要是因为我觉得Python不允许你定义每个变量的类型有点不安全。谢谢你这么做他的帮助不仅解决了我的问题，还帮助了我向python的过渡。再次感谢！