在Python中解析.pdb文件并为特定记录类型创建字典

首先,让我开始说,我是作为Python练习来这样做的,并且不允许使用Biopython

我正在编写一个脚本,该脚本将帮助我解析从轨迹生成的任何.pdb文件。我正在尝试创建将 chain变量 resnumber变量链接的字典。尽管我为只有2个链的特定.pdb文件解决了该问题,但无论链的数量如何,我都希望此脚本适用于任何.pdb文件。这是我写的:

import sys

pdbTraj = open('md20_aligned_3frames.pdb','r')
pdbTraj_line = pdbTraj.readlines()
newFile = open('newfile.txt','w')
pdbDict = {}
resnumberList1 = []
resnumberList2 = []
chainTry = "A"
for line in pdbTraj_line:
    if line.startswith(("ATOM" or "HetaTM")):
        atomType = line[0:6]
        atomSerialNumber = line[6:11]
        atomName = line[12:16]
        resname = line[17:20]
        chain = line[21]
        resnumber = line[22:26]
        coorX = line[30:38]
        coorY = line[38:46]
        coorZ = line[46:54]
        occupancy = line[54:60]
        temperatureFact = line[60:66]
        segmentIdentifier = line[72:76]
        elementSymbol = line[76:78]
        if chain == chainTry:
            resnumberList1.append(resnumber)
            pdbDict[chain] = list(dict.fromkeys(resnumberList1))
        else:
            resnumberList2.append(resnumber)
            pdbDict[chain] = list(dict.fromkeys(resnumberList2))

print(pdbDict)

这是我得到的结果:

{'A': ['   1','   2','   3','   4','   5','   6','   7','   8','   9','  10','  11','  12','  13','  14','  15','  16','  17'],'B': ['  19','  20','  21','  22','  23','  24','  25','  26','  27','  28','  29','  30','  31','  32','  33','  34','  35','  36','  37','  38','  39','  40','  41','  42','  43','  44','  45','  46','  47','  48','  49','  50','  51','  52','  53','  54','  55','  56','  57','  58','  59','  60','  61','  62','  63','  64','  65','  66','  67','  68','  69','  70','  71','  72','  73','  74','  75','  76','  77','  78','  79','  80','  81','  82','  83','  84','  85','  86','  87','  88','  89','  90','  91','  92','  93','  94','  95','  96','  97','  98','  99',' 100',' 101',' 102',' 103',' 104',' 105',' 106',' 107',' 108',' 109',' 110',' 111',' 112',' 113',' 114',' 115',' 116',' 117',' 118',' 119',' 120',' 121',' 122',' 123',' 124',' 125',' 126',' 127',' 128',' 129',' 130',' 131',' 132',' 133',' 134',' 135',' 136',' 137',' 138',' 139',' 140',' 141',' 142',' 143',' 144',' 145',' 146',' 147',' 148',' 149',' 150',' 151',' 152',' 153',' 154',' 155',' 156',' 157',' 158',' 159',' 160',' 161',' 162',' 163',' 164',' 165',' 166',' 167',' 168',' 169',' 170',' 171',' 172',' 173',' 174',' 175',' 176',' 177',' 178',' 179',' 180',' 181',' 182',' 183',' 184',' 185',' 186',' 187',' 188',' 189',' 190',' 191',' 192',' 193',' 194',' 195',' 196',' 197',' 198',' 199',' 200',' 201',' 202',' 203',' 204',' 205',' 206',' 207',' 208',' 209',' 210',' 211',' 212',' 213',' 214',' 215',' 216',' 217',' 218',' 219',' 220',' 221',' 222',' 223',' 224',' 225',' 226',' 227',' 228',' 229',' 230',' 231',' 232',' 233',' 234',' 235',' 236',' 237',' 238',' 239',' 240',' 241',' 242',' 243',' 244',' 245',' 246',' 247',' 248',' 249',' 250',' 251',' 252',' 253',' 254',' 255',' 256',' 257',' 258',' 259',' 260',' 261',' 262',' 263',' 264',' 265',' 266',' 267',' 268',' 269',' 270',' 271',' 272',' 273',' 274',' 275',' 276',' 277',' 278',' 279',' 280',' 281',' 282',' 283',' 284',' 285',' 286',' 287',' 288',' 289',' 290',' 291',' 292',' 293',' 294',' 295',' 296',' 297',' 298',' 299',' 300',' 301',' 302',' 303',' 304',' 305',' 306',' 307',' 308',' 309',' 310',' 311',' 312',' 313',' 314',' 315',' 316',' 317',' 318',' 319',' 320',' 321',' 322',' 323',' 324',' 325',' 326',' 327',' 328',' 329',' 330',' 331',' 332',' 333',' 334',' 335',' 336',' 337',' 338',' 339',' 340',' 341',' 342',' 343',' 344',' 345',' 346',' 347',' 348',' 349',' 350',' 351',' 352',' 353',' 354',' 355',' 356',' 357',' 358',' 359',' 360',' 361',' 362',' 363',' 364',' 365',' 366',' 367',' 368',' 369',' 370',' 371']}

因此,有2个键(链A 链B )和2个列表(链A的 resnumber 和链B的 resnumber )。

您能帮我将任何.pdb文件的脚本通用化吗? 谢谢!

.pdb文件格式的前几行如下:

CRYST1   91.372  118.560   70.786  90.00  90.00  90.00 P 1           1
ATOM      1  N   LYS A   1      10.246  29.908   8.932  0.00  0.00      A     
ATOM      2  HT1 LYS A   1      11.053  29.331   8.619  0.00  0.00      A     
ATOM      3  HT2 LYS A   1      10.405  30.386   9.842  0.00  0.00      A     
ATOM      4  HT3 LYS A   1      10.211  30.643   8.197  0.00  0.00      A     
ATOM      5  CA  LYS A   1       9.010  29.017   8.844  0.00  0.00      A     
ATOM      6  HA  LYS A   1       9.395  28.160   8.311  0.00  0.00      A     
ATOM      7  CB  LYS A   1       8.484  28.723  10.313  0.00  0.00      A     
ATOM      8  HB1 LYS A   1       9.376  28.807  10.970  0.00  0.00      A     
ATOM      9  HB2 LYS A   1       7.797  29.544  10.609  0.00  0.00      A     
ATOM     10  CG  LYS A   1       7.855  27.321  10.494  0.00  0.00      A     
ATOM     11  HG1 LYS A   1       7.016  27.501  11.199  0.00  0.00      A     
ATOM     12  HG2 LYS A   1       7.294  26.942   9.613  0.00  0.00      A     
ATOM     13  CD  LYS A   1       8.769  26.282  10.991  0.00  0.00      A     
ATOM     14  HD1 LYS A   1       9.376  26.065  10.088  0.00  0.00      A     
ATOM     15  HD2 LYS A   1       9.476  26.682  11.750  0.00  0.00      A     
ATOM     16  CE  LYS A   1       7.894  25.110  11.592  0.00  0.00      A     
ATOM     17  HE1 LYS A   1       7.347  25.505  12.475  0.00  0.00      A    

或者这样您也可以看到链B:

ATOM   3802  N   TYR B 240      -9.050 -41.325  16.074  0.00  0.00      B     
ATOM   3803  HN  TYR B 240      -8.672 -40.404  16.021  0.00  0.00      B     
ATOM   3804  CA  TYR B 240     -10.166 -41.491  15.204  0.00  0.00      B     
ATOM   3805  HA  TYR B 240      -9.685 -41.605  14.243  0.00  0.00      B     
ATOM   3806  CB  TYR B 240     -10.940 -42.818  15.365  0.00  0.00      B     
ATOM   3807  HB1 TYR B 240     -10.241 -43.631  15.078  0.00  0.00      B     
ATOM   3808  HB2 TYR B 240     -11.241 -43.061  16.407  0.00  0.00      B     
ATOM   3809  CG  TYR B 240     -12.233 -42.972  14.454  0.00  0.00      B     
ATOM   3810  CD1 TYR B 240     -12.102 -43.272  13.086  0.00  0.00      B     
ATOM   3811  HD1 TYR B 240     -11.100 -43.348  12.692  0.00  0.00      B     
ATOM   3812  CE1 TYR B 240     -13.248 -43.404  12.343  0.00  0.00      B     
ATOM   3813  HE1 TYR B 240     -13.093 -43.818  11.358  0.00  0.00      B     

如果您需要有关.pdb文件格式的更多信息,here是一个链接。

dgdfhdshdsh 回答:在Python中解析.pdb文件并为特定记录类型创建字典

我的解决方法如下:

#Create an empty dictionary
pdb_dict={}

#1. create a list containing each file as a list
all_lines=[filter(lambda x: x != '',line.strip('\n').split(' ')) for line in open('test.pdb','r').readlines()]

#2.Use a set comprehension to identify all unique chains in the file
#(This approach assumes that there will be no two different chains with
#the same name in the same file)
chains={line[-1] for line in all_lines if line[0]==('ATOM' or 'HETATM')}

#3.Create a dictionary key for each chain and append the residue numbers
for chain in chains:
    pdb_dict[chain]=[line[1] for line in all_lines if line[0]==('ATOM' or 'HETATM') and line[-1]==chain]

此方法包括三个步骤:

首先,您将文件读入列表列表。由于文件中的值以空格分隔,因此可以拆分从open('test.pdb','r').readlines()获得的每一行。由于分隔值的空格数是可变的,因此您将在列表中获得一些只是空格的值。然后,使用lambda函数,从列表(文件的一行)中筛选出仅一个空格('')的每个元素。现在,您基本上可以按索引访问每个列表中的信息了,该索引对应于pdb文件中的列(从列0开始)。

第二,您遍历先前创建的行列表,并标识文件中所有唯一的链。这就是集合理解的作用。

最后,您遍历链集。对于每个链,您都可以在字典中创建一个键,并将分配给该链的所有残基编号附加到该键上。

基于列表中的每个值索引,您可以使用我们之前创建的all_lines列表轻松解析其他值,例如:

for line in all_lines:
    atomType=line[0]
    atomSerialNumber=line[1]
    atomName=line[2]
    .
    .
    .

我使用了以下示例文件:

CRYST1   91.372  118.560   70.786  90.00  90.00  90.00 P 1           1
ATOM      1  N   LYS A   1      10.246  29.908   8.932  0.00  0.00      A
ATOM      2  HT1 LYS A   1      11.053  29.331   8.619  0.00  0.00      A
ATOM      3  HT2 LYS A   1      10.405  30.386   9.842  0.00  0.00      A
ATOM      4  HT3 LYS A   1      10.211  30.643   8.197  0.00  0.00      A
ATOM      5  CA  LYS A   1       9.010  29.017   8.844  0.00  0.00      A
ATOM      6  HA  LYS A   1       9.395  28.160   8.311  0.00  0.00      A
ATOM      7  CB  LYS A   1       8.484  28.723  10.313  0.00  0.00      A
ATOM      8  HB1 LYS A   1       9.376  28.807  10.970  0.00  0.00      A
ATOM      9  HB2 LYS A   1       7.797  29.544  10.609  0.00  0.00      A
ATOM     10  CG  LYS A   1       7.855  27.321  10.494  0.00  0.00      A
ATOM     11  HG1 LYS A   1       7.016  27.501  11.199  0.00  0.00      A
ATOM     12  HG2 LYS A   1       7.294  26.942   9.613  0.00  0.00      A
ATOM     13  CD  LYS A   1       8.769  26.282  10.991  0.00  0.00      A
ATOM     14  HD1 LYS A   1       9.376  26.065  10.088  0.00  0.00      A
ATOM     15  HD2 LYS A   1       9.476  26.682  11.750  0.00  0.00      A
ATOM     16  CE  LYS A   1       7.894  25.110  11.592  0.00  0.00      A
ATOM     17  HE1 LYS A   1       7.347  25.505  12.475  0.00  0.00      A
ATOM   1800  N   TYR B 240      -9.050 -41.325  16.074  0.00  0.00      B
ATOM   1802  HN  TYR B 240      -8.672 -40.404  16.021  0.00  0.00      B
ATOM   1803  CA  TYR B 240     -10.166 -41.491  15.204  0.00  0.00      B
ATOM   1804  HA  TYR B 240      -9.685 -41.605  14.243  0.00  0.00      B
ATOM   1805  CB  TYR B 240     -10.940 -42.818  15.365  0.00  0.00      B
ATOM   1806  HB1 TYR B 240     -10.241 -43.631  15.078  0.00  0.00      B
ATOM   1807  HB2 TYR B 240     -11.241 -43.061  16.407  0.00  0.00      B
ATOM   1808  CG  TYR B 240     -12.233 -42.972  14.454  0.00  0.00      B
ATOM   1810  CD1 TYR B 240     -12.102 -43.272  13.086  0.00  0.00      B
ATOM   1811  HD1 TYR B 240     -11.100 -43.348  12.692  0.00  0.00      B
ATOM   1812  CE1 TYR B 240     -13.248 -43.404  12.343  0.00  0.00      B
ATOM   1813  HE1 TYR B 240     -13.093 -43.818  11.358  0.00  0.00      B
ATOM   1814  N   TYR B 240      -9.050 -41.325  16.074  0.00  0.00      B
ATOM   1815  HN  TYR B 240      -8.672 -40.404  16.021  0.00  0.00      B
ATOM   1816  CA  TYR B 240     -10.166 -41.491  15.204  0.00  0.00      B
ATOM   1817  HA  TYR B 240      -9.685 -41.605  14.243  0.00  0.00      B
ATOM   1818  CB  TYR B 240     -10.940 -42.818  15.365  0.00  0.00      B
ATOM   3807  HB1 TYR C 240     -10.241 -43.631  15.078  0.00  0.00      C
ATOM   3808  HB2 TYR C 240     -11.241 -43.061  16.407  0.00  0.00      C
ATOM   3809  CG  TYR C 240     -12.233 -42.972  14.454  0.00  0.00      C
ATOM   3810  CD1 TYR C 240     -12.102 -43.272  13.086  0.00  0.00      C
ATOM   3811  HD1 TYR C 240     -11.100 -43.348  12.692  0.00  0.00      C
ATOM   3812  CE1 TYR C 240     -13.248 -43.404  12.343  0.00  0.00      C
ATOM   3813  HE1 TYR C 240     -13.093 -43.818  11.358  0.00  0.00      C

运行上述代码可获得预期结果:

pdb_dict{'A': ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17'],'C': ['3807','3808','3809','3810','3811','3812','3813'],'B': ['1800','1802','1803','1804','1805','1806','1807','1808','1810','1811','1812','1813','1814','1815','1816','1817','1818']}
本文链接:https://www.f2er.com/3065690.html

大家都在问