在python中提取字符串的一部分有限制

在python中提取字符串的一部分有限制,python,python-3.x,Python,Python 3.x,我有一个字符串输出,如下所示: Distance AAAB: ,0.13634,0.13700,0.00080,0.00080,-0.00066,.00001, Distance AAAC: ,0.12617,0.12680,0.00080,0.00080,-0.00063,, Distance AAAD: ,0.17045,0.16990,0.00080,0.00080,0.00055,, Distance AAAE: ,0.09330,0.09320,0.00080,0.00080,0.00

我有一个字符串输出,如下所示:

Distance AAAB: ,0.13634,0.13700,0.00080,0.00080,-0.00066,.00001,
Distance AAAC: ,0.12617,0.12680,0.00080,0.00080,-0.00063,,
Distance AAAD: ,0.17045,0.16990,0.00080,0.00080,0.00055,,
Distance AAAE: ,0.09330,0.09320,0.00080,0.00080,0.00010,,
Distance AAAF: ,0.21048,0.21100,0.00080,0.00080,-0.00052,,
Distance AAAG: ,0.02518,0.02540,0.00040,0.00040,-0.00022,,
Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,
Distance AAAI: ,0.10811,0.10860,0.00080,0.00070,-0.00049,,
Distance AAAJ: ,0.02430,0.02400,0.00200,0.00200,0.00030,,
Distance AAAK: ,0.09449,0.09400,0.00200,0.00100,0.00049,,
Distance AAAL: ,0.07689,0.07660,0.00050,0.00050,0.00029,
我想做的是从这个块中提取一组特定的数据,例如,仅距离AAAH,如下所示:

Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,
测量总是从距离AAA*开始:星星是唯一会改变的字符

并发症: 这需要是通用的,因为我有很多不同的数据集,所以距离AAAH后面可能不总是跟距离AAAI,也不总是跟距离AAAG,因为不同项目的测量值不同。我也不能依赖.len(),因为最后一次测量有时可能是空白的(如距离AAAH),也可能是填充的(如距离AAAB)。我认为我不能使用.find(),因为我需要距离AAAH后面的所有数字


我还是一个新手,我尽了最大努力找到了一个类似于这个问题的解决方案,但运气不太好。

你可以使用
re
模块。制作一个函数应该很方便

import re
def SearchDistance(pattern,text):
    pattern = pattern.replace(' ','\s')
    print re.findall(r'{0}.+'.format(pattern),a)

SearchDistance('Distance AAAH',a)
输出:

['Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,']

您可以使用
re
模块。制作一个函数应该很方便

import re
def SearchDistance(pattern,text):
    pattern = pattern.replace(' ','\s')
    print re.findall(r'{0}.+'.format(pattern),a)

SearchDistance('Distance AAAH',a)
输出:

['Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,']

您可以通过以下脚本搜索文本:

#fullText = YOUR STRING
text = fullText.splitlines()
for line in text:
    if line.startswith('Distance AAAH:'):
        print line

输出:
距离AAAH:,0.11404,0.11450,0.00120,0.00110,-0.00046,,

您可以通过以下脚本搜索文本:

#fullText = YOUR STRING
text = fullText.splitlines()
for line in text:
    if line.startswith('Distance AAAH:'):
        print line

输出:
Distance AAAH:,0.11404,0.11450,0.00120,0.00110,-0.00046,,

您能否提供更多您将要执行的查询类型的示例?例如,您是否总是要提供完整的字母序列?您是否会使用通配符,例如A*L?您能否提供更多您将要执行的查询类型的示例ng?例如,您是否总是要提供完整的字母序列?您是否会使用通配符,例如A*L?这是一种非常糟糕的情况,除非您能够明确保证行上的数据不会包含
AAAH
if line.startswith('Distance AAAH:')):
会更安全,并清楚地表明您的意图。您可以使用
splitlines()
,而不是
splitlines('\n')
。此外,
str
是一个糟糕的变量名,因为它会隐藏类型。@ShadowRanger Right,edited。但我认为这对人们来说很好;如果“AAAH”,请使用
行内:
当需要搜索整行时,这是一种非常糟糕的情况,除非您能明确保证行上的数据不会包含
AAAH
如果行内有('Distance AAAH:'):
会更安全,并清楚地表明您的意图。而不是
拆分('\n'))
,您可以使用
splitlines()
。此外,
str
是一个糟糕的变量名,因为它会隐藏类型。@ShadowRanger右侧,已编辑。但我认为了解人类很好;如果“AAAH”,请使用
行内:
当需要搜索整行时,您可能希望在收到的模式上使用,除非您打算允许调用方传递和使用正则表达式特殊字符。如果模式从不包含正则表达式特殊字符,显然不是问题,但这是一个错误的假设。您可能希望在除非您打算允许调用者传递和使用正则表达式特殊字符,否则您将收到一个e模式。如果模式从不包含正则表达式特殊字符,显然不是问题,但大多数情况下这是一个错误的假设。