Python 试图找到一种聪明的方法来查找给定字符串中关键字的索引_Python

Python 试图找到一种聪明的方法来查找给定字符串中关键字的索引

python

Python 试图找到一种聪明的方法来查找给定字符串中关键字的索引,python,Python,我知道有很多关于在字符串中查找给定关键字的索引的主题，但我的情况有点不同我有两个输入，一个是字符串，另一个是映射列表（或者随便你怎么称呼它）每个单词将始终映射到映射列表中的一个数字上。现在我想在匹配字符串时找到给定数字的所有索引，比如1 在上述情况下，它将返回[0,2,17]（Thakns@rahlf23）我目前的方法是通过执行以下操作将每个单词压缩成一个数字 zip(mapping_list.split(' '), s.split(' ')) 这让我 ('1', 'I') ('1',

我知道有很多关于在字符串中查找给定关键字的索引的主题，但我的情况有点不同

我有两个输入，一个是字符串，另一个是映射列表（或者随便你怎么称呼它）

每个单词将始终映射到映射列表中的一个数字上。现在我想在匹配字符串时找到给定数字的所有索引，比如1

在上述情况下，它将返回[0,2,17]（Thakns@rahlf23）

我目前的方法是通过执行以下操作将每个单词压缩成一个数字

zip(mapping_list.split(' '), s.split(' '))

这让我

('1', 'I')
('1', 'am')
('2', 'awesome')
('3', 'and')
('1', 'I')
('2', 'love')
('3', 'you')

然后遍历列表，找到“1”，使用这个词生成一个正则表达式，然后搜索索引并将其附加到列表或其他内容中。冲洗并重复

然而，这似乎非常低效，尤其是当

变得非常长时

我想知道是否有更好的方法来处理这个问题。

你可以

将这些单词映射到它们的len
并使用，尽管你必须在每个长度（空格）中添加1
，并在第一个单词的开头添加一个首字母0

>>> words = "I am awesome and I love you".split()
>>> mapping = list(map(int, "1 1 2 3 1 2 3".split()))
>>> start_indices = list(itertools.accumulate([0] + [len(w)+1 for w in words]))
>>> start_indices
[0, 2, 5, 13, 17, 19, 24, 28]

最后一个元素未使用。然后，zip
并迭代这些对，并将它们收集到字典中
>>> d = collections.defaultdict(list)
>>> for x, y in zip(mapping, start_indices):
...     d[x].append(y)
>>> dict(d)
>>> {1: [0, 2, 17], 2: [5, 19], 3: [13, 24]}

或者，您也可以使用like\b\w
（单词边界后跟单词字符）来查找单词开始的每个位置，然后按照上述步骤进行操作
>>> s = "I am awesome and I love you"
>>> [m.start() for m in re.finditer(r"\b\w", s)]
[0, 2, 5, 13, 17, 19, 24]

你这么做过很多次吗？或者你只是担心s
变长？这可能接近最优，我看到的唯一改进是将s.split
转换为一个用于恒定内存的生成器。不过，我有点怀疑这是一个瓶颈。我认为它应该返回[0,2,17]
。或者使用默认值dict@JChao您是否尝试使用很长的字符串来计时您的解决方案？假设将有1k个s
，并且在拆分后，每个s将有50到100个字符长，列表的典型长度是7到10谢谢！这些方法看起来比我的干净多了。不过我要做一些快速基准测试。
>>> s = "I am awesome and I love you"
>>> [m.start() for m in re.finditer(r"\b\w", s)]
[0, 2, 5, 13, 17, 19, 24]

# Find the indices of all the word starts
word_starts = [0] + [m.start()+1 for m in re.finditer(' ', s)]

# Break the mapping list into an actual list
mapping = mapping_list.split(' ')

# Find the indices in the mapping list we care about
word_indices = [i for i, e in enumerate(mapping) if e == '1']

# Map those indices onto the word start indices
word_starts_at_indices = [word_starts[i] for i in word_indices]
# Or you can do the last line the fancy way:
# word_starts_at_indices = operator.itemgetter(*word_indices)(word_starts)