&引用&引用;通配符没有';我在python的正则表达式中不工作?
所以我想从一段.pdb格式的数据中提取以下氨基酸缩写: ['GLU','PHE',…,'ASN']&引用&引用;通配符没有';我在python的正则表达式中不工作?,python,regex,Python,Regex,所以我想从一段.pdb格式的数据中提取以下氨基酸缩写: ['GLU','PHE',…,'ASN'] ATOM 296 OE2 GLU A 43 18.414 12.323 8.758 1.00 32.23 O ATOM 297 N PHE A 50 18.072 10.668 14.644 1.00 34.68 N ATOM 298 CA PHE A 50 18.038
ATOM 296 OE2 GLU A 43 18.414 12.323 8.758 1.00 32.23 O
ATOM 297 N PHE A 50 18.072 10.668 14.644 1.00 34.68 N
ATOM 298 CA PHE A 50 18.038 10.228 16.039 1.00 35.61 C
ATOM 299 C PHE A 50 18.501 11.321 17.019 1.00 35.86 C
ATOM 300 O PHE A 50 18.018 11.413 18.091 1.00 36.21 O
ATOM 301 CB PHE A 50 18.844 8.936 16.226 1.00 35.43 C
ATOM 302 CG PHE A 50 18.811 8.386 17.623 1.00 37.33 C
ATOM 303 CD1 PHE A 50 17.924 7.416 17.982 1.00 36.31 C
ATOM 304 CD2 PHE A 50 19.659 8.840 18.557 1.00 39.84 C
ATOM 305 CE1 PHE A 50 17.875 6.922 19.220 1.00 37.80 C
ATOM 306 CE2 PHE A 50 19.591 8.330 19.833 1.00 40.97 C
ATOM 307 CZ PHE A 50 18.709 7.368 20.144 1.00 37.91 C
ATOM 308 N ASN A 51 19.462 12.125 16.616 1.00 36.20 N ...
我在python脚本中使用了以下命令:
residue=re.compile(r"(?<=ATOM...............)+?(?=..............\.)").findall(fpdb)
residence=re.compile(r)(?使用str.split()
Ex:
s = """ATOM 296 OE2 GLU A 43 18.414 12.323 8.758 1.00 32.23 O
ATOM 297 N PHE A 50 18.072 10.668 14.644 1.00 34.68 N
ATOM 298 CA PHE A 50 18.038 10.228 16.039 1.00 35.61 C
ATOM 299 C PHE A 50 18.501 11.321 17.019 1.00 35.86 C
ATOM 300 O PHE A 50 18.018 11.413 18.091 1.00 36.21 O
ATOM 301 CB PHE A 50 18.844 8.936 16.226 1.00 35.43 C
ATOM 302 CG PHE A 50 18.811 8.386 17.623 1.00 37.33 C
ATOM 303 CD1 PHE A 50 17.924 7.416 17.982 1.00 36.31 C
ATOM 304 CD2 PHE A 50 19.659 8.840 18.557 1.00 39.84 C
ATOM 305 CE1 PHE A 50 17.875 6.922 19.220 1.00 37.80 C
ATOM 306 CE2 PHE A 50 19.591 8.330 19.833 1.00 40.97 C
ATOM 307 CZ PHE A 50 18.709 7.368 20.144 1.00 37.91 C
ATOM 308 N ASN A 51 19.462 12.125 16.616 1.00 36.20 N"""
for i in s.split("\n"):
print(i.split()[3])
GLU
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
ASN
data = [i.split()[3] for i in s.split("\n")]
print(data)
#['GLU', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASN']
输出:
s = """ATOM 296 OE2 GLU A 43 18.414 12.323 8.758 1.00 32.23 O
ATOM 297 N PHE A 50 18.072 10.668 14.644 1.00 34.68 N
ATOM 298 CA PHE A 50 18.038 10.228 16.039 1.00 35.61 C
ATOM 299 C PHE A 50 18.501 11.321 17.019 1.00 35.86 C
ATOM 300 O PHE A 50 18.018 11.413 18.091 1.00 36.21 O
ATOM 301 CB PHE A 50 18.844 8.936 16.226 1.00 35.43 C
ATOM 302 CG PHE A 50 18.811 8.386 17.623 1.00 37.33 C
ATOM 303 CD1 PHE A 50 17.924 7.416 17.982 1.00 36.31 C
ATOM 304 CD2 PHE A 50 19.659 8.840 18.557 1.00 39.84 C
ATOM 305 CE1 PHE A 50 17.875 6.922 19.220 1.00 37.80 C
ATOM 306 CE2 PHE A 50 19.591 8.330 19.833 1.00 40.97 C
ATOM 307 CZ PHE A 50 18.709 7.368 20.144 1.00 37.91 C
ATOM 308 N ASN A 51 19.462 12.125 16.616 1.00 36.20 N"""
for i in s.split("\n"):
print(i.split()[3])
GLU
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
ASN
data = [i.split()[3] for i in s.split("\n")]
print(data)
#['GLU', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASN']
使用列表理解
Ex:
s = """ATOM 296 OE2 GLU A 43 18.414 12.323 8.758 1.00 32.23 O
ATOM 297 N PHE A 50 18.072 10.668 14.644 1.00 34.68 N
ATOM 298 CA PHE A 50 18.038 10.228 16.039 1.00 35.61 C
ATOM 299 C PHE A 50 18.501 11.321 17.019 1.00 35.86 C
ATOM 300 O PHE A 50 18.018 11.413 18.091 1.00 36.21 O
ATOM 301 CB PHE A 50 18.844 8.936 16.226 1.00 35.43 C
ATOM 302 CG PHE A 50 18.811 8.386 17.623 1.00 37.33 C
ATOM 303 CD1 PHE A 50 17.924 7.416 17.982 1.00 36.31 C
ATOM 304 CD2 PHE A 50 19.659 8.840 18.557 1.00 39.84 C
ATOM 305 CE1 PHE A 50 17.875 6.922 19.220 1.00 37.80 C
ATOM 306 CE2 PHE A 50 19.591 8.330 19.833 1.00 40.97 C
ATOM 307 CZ PHE A 50 18.709 7.368 20.144 1.00 37.91 C
ATOM 308 N ASN A 51 19.462 12.125 16.616 1.00 36.20 N"""
for i in s.split("\n"):
print(i.split()[3])
GLU
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
PHE
ASN
data = [i.split()[3] for i in s.split("\n")]
print(data)
#['GLU', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASN']
使用正则表达式
假设没有丢失的单元格值,如果要在有12列时提取第3列(从0开始的列)
import re
re.split(r'\s+', fpdb)[3::12]
# ['GLU', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASN']
问题是,在这段文字之前和之后,我有很多不相关的文字…所以用你的方法,我必须先提取这段文字…但是,这仍然是有意义的。谢谢你!尝试使用(?也许说“我想要10个点”更简单的方法是r.{10}”
,而不是r。。。。。。。。。。“
。在大约3分钟后调试重复字符变得很困难。