使用索引或查找方法进行精确的单词匹配-python_Python_Indexing_Find

使用索引或查找方法进行精确的单词匹配-python

python indexing

使用索引或查找方法进行精确的单词匹配-python,python,indexing,find,Python,Indexing,Find,我有一个字符串“then there”，我想搜索准确/完整的单词，例如，在本例中，“the”只出现一次。但是，使用index（）或find（）方法时，会认为出现三次，因为它与“then”和“there”也是部分匹配的。我喜欢使用这两种方法中的任何一种，我能调整它们以使其工作吗 >>> s = "the then there" >>> s.index("the") 0 >>> s.index("the",1) 4 >>> s.

我有一个字符串“then there”，我想搜索准确/完整的单词，例如，在本例中，“the”只出现一次。但是，使用index（）或find（）方法时，会认为出现三次，因为它与“then”和“there”也是部分匹配的。我喜欢使用这两种方法中的任何一种，我能调整它们以使其工作吗

>>> s = "the then there"
>>> s.index("the")
0
>>> s.index("the",1)
4
>>> s.index("the",5)
9
>>> s.find("the")
0
>>> s.find("the",1)
4
>>> s.find("the",5)
9

首先使用将字符串转换为单词列表，然后搜索单词

>>> s = "the then there"
>>> s_list = s.split() # list of words having content: ['the', 'then', 'there']
>>> s_list.index("the")
0
>>> s_list.index("then")
1
>>> s_list.index("there")
2

首先使用将字符串转换为单词列表，然后搜索单词

>>> s = "the then there"
>>> s_list = s.split() # list of words having content: ['the', 'then', 'there']
>>> s_list.index("the")
0
>>> s_list.index("then")
1
>>> s_list.index("there")
2

要查找大文本中确切/完整单词的第一个位置，请尝试使用

re.search（）

和

match.start（）

函数应用以下方法：

import re

test_str = "when we came here, what we saw that the then there the"
search_str = 'the'
m = re.search(r'\b'+ re.escape(search_str) +r'\b', test_str, re.IGNORECASE)
if m:
    pos = m.start()
    print(pos)

输出：

要在大文本中找到准确/完整单词的第一个位置，请尝试使用

re.search（）

和

match.start（）

函数应用以下方法：

import re

test_str = "when we came here, what we saw that the then there the"
search_str = 'the'
m = re.search(r'\b'+ re.escape(search_str) +r'\b', test_str, re.IGNORECASE)
if m:
    pos = m.start()
    print(pos)

输出：

使用regex

\b\b

使用regex

\b性能对我的用例来说是一个问题，因为它可能是一个非常大的文件，因此试图避免创建一个巨大的列表…无论如何，它是一个巨大的文件。您需要将其存储为str
或list
，但需要将其存储在某个位置。对吗？以字符串形式读取内容，形成列表。如果你对节省空间更感兴趣。获得列表后，将其转换为字典，单词作为关键字，值作为该单词第一次出现的索引。显式删除未使用的变量（如存储字符串和listPerformance的变量）对于我的用例来说是一个问题，因为它可能是一个非常大的文件，因此试图避免生成一个巨大的列表…无论如何，它是一个巨大的文件。您需要将其存储为str
或list
，但需要将其存储在某个位置。对吗？以字符串形式读取内容，形成列表。如果你对节省空间更感兴趣。获得列表后，将其转换为字典，单词作为关键字，值作为该单词第一次出现的索引。显式删除未使用的变量，如存储字符串和列表的变量