查找Python列表前后的单词_Python

查找Python列表前后的单词

python

查找Python列表前后的单词,python,Python,这与以下问题有关— 我有这样的绳子- sentence = 'AASFG BBBSDC FEKGG SDFGF' 我将其拆分，得到如下单词列表- sentence = ['AASFG', 'BBBSDC', 'FEKGG', 'SDFGF'] 我使用下面的代码搜索单词的一部分，得到整个单词- [word for word in sentence.split() if word.endswith("GG")] 它返回['FEKGG'] 现在我需要找出这个词的前后例如，当我搜索“GG”时，它

这与以下问题有关—

我有这样的绳子-

sentence = 'AASFG BBBSDC FEKGG SDFGF'

我将其拆分，得到如下单词列表-

sentence = ['AASFG', 'BBBSDC', 'FEKGG', 'SDFGF']

我使用下面的代码搜索单词的一部分，得到整个单词-

[word for word in sentence.split() if word.endswith("GG")]

它返回

['FEKGG']

现在我需要找出这个词的前后

例如，当我搜索“GG”时，它返回

['FEKGG']

。而且它应该能够

behind = 'BBBSDC'
infront = 'SDFGF'

这里有一种可能性：

words = sentence.split()
[pos] = [i for (i, word) in enumerate(words) if word.endswith("GG") ]
behind = words[pos - 1]
infront = words[pos + 1]

您可能需要注意边缘情况，例如，

“…GG”

不出现、出现多次或是第一个和/或最后一个词。就目前而言，任何这些都会引发一个例外，这很可能是正确的行为

使用正则表达式的完全不同的解决方案首先避免将字符串拆分为数组：

match = re.search(r'\b(\w+)\s+(?:\w+GG)\s+(\w+)\b', sentence)
(behind, infront) = m.groups()

如果您有以下字符串（从原始字符串编辑）：

这将返回：

[('BBBSDC', 'FEKGG', 'SDFGF'), ('SDFGF', 'KETGG', None)]

这是一种方式。如果“GG”字位于句子的开头或结尾，则前后元素将为

None

words = sentence.split()
[(infront, word, behind) for (infront, word, behind) in 
 zip([None] + words[:-1], words, words[1:] + [None])
 if word.endswith("GG")]

输出：

Behind: BBBSDC
Match: FEKGG
Infront: SDFGF

Behind: SDFGF
Match: AAABGG
Infront: FOOO

Behind: FOOO
Match: EEEGG
Infront: None

另一个基于itertools的选项可能对大型数据集的内存更友好

from itertools import tee, izip
def sentence_targets(sentence, endstring):
   before, target, after = tee(sentence.split(), 3)
   # offset the iterators....
   target.next()
   after.next()
   after.next()
   for trigram in izip(before, target, after):
       if trigram[1].endswith(endstring): yield trigram

编辑：修复了打字错误

这正是您想要的。AttributeError:'itertools.tee'对象没有属性'endswith'，如果我们中有人给出了您需要的内容，您能选择一个有效的答案吗？请选择一个有效的答案！

Behind: BBBSDC
Match: FEKGG
Infront: SDFGF

Behind: SDFGF
Match: AAABGG
Infront: FOOO

Behind: FOOO
Match: EEEGG
Infront: None

from itertools import tee, izip
def sentence_targets(sentence, endstring):
   before, target, after = tee(sentence.split(), 3)
   # offset the iterators....
   target.next()
   after.next()
   after.next()
   for trigram in izip(before, target, after):
       if trigram[1].endswith(endstring): yield trigram