Python 检查字符串列表以提取特定单词的有效方法_Python

Python 检查字符串列表以提取特定单词的有效方法

python

Python 检查字符串列表以提取特定单词的有效方法,python,Python,我试图检查20000个字符串列表，并与某些单词/短语进行比较，以便将它们正确地分为3类以下是字符串的示例列表： sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"] 所以我想检查字符串是否有： "empty" and "bus" and "empty" then empty

我试图检查20000个字符串列表，并与某些单词/短语进行比较，以便将它们正确地分为3类

以下是字符串的示例列表：

  sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

所以我想检查字符串是否有：

    "empty" and "bus" and "empty" then emptyCount += 1

    "order canceled" or "canceled" then cancelcount += 1

    "empty" or "site" or "no empty on site" then site += 1

我有一个代码可以做到这一点，但我不认为它更有效，而且可能实际上遗漏了一些关键点。关于如何进行这件事，有什么建议吗

    site = 0
    cancel = 0
    empty = 0
    count = 0
    for i in sample:
        if "empty" and "bus" and "empty" in i:
           emptycount += 1
        elif "order canceled" or "canceled":
           cancelcount += 1
        elif "empty" or "site" or "no empty on site" 
           site += 1

        else:
           count += 1

你甚至不需要提取

您所需要做的就是搜索和递增计数

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

empty_counter = 0
for string_item in sample:
    if 'empty' in string_item:
        empty_counter += 1

print(empty_counter)

如果你想要的是效率，那么我建议你使用熊猫。这将根据数据的大小将您的效率提高100倍，这是一个数据科学包，意味着它可以非常快速地处理数百万数据

#import pandas package.
import pandas as pd

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

# create a pandas series
sr = pd.Series(sample) 

#search for match and store results 
results = sr.str.match(pat = '(empty)&(bus)' )

#gives total number of matching items
print(results.shape[0])

你甚至不需要提取

您所需要做的就是搜索和递增计数

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

empty_counter = 0
for string_item in sample:
    if 'empty' in string_item:
        empty_counter += 1

print(empty_counter)

#import pandas package.
import pandas as pd

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

# create a pandas series
sr = pd.Series(sample) 

#search for match and store results 
results = sr.str.match(pat = '(empty)&(bus)' )

#gives total number of matching items
print(results.shape[0])

你能分享你当前使用的代码吗？“我有一个这样做的代码”-请展示它并解释它如何无效。如果你想让它更快，你可以使用ThreadsOkay，我现在就编辑并输入我的代码/。。感谢您不必使用count，如果您想知道列表中有多少字符串，请使用

len（示例）

您可以共享您当前使用的代码吗？“我有一个这样做的代码”-请展示它并解释它是如何无效的。如果您想让它更快，您可以使用ThreadsOkay，我现在就编辑并输入我的代码/。。谢谢你不必使用count，如果你想知道列表中有多少个字符串，只需使用

len（sample）

谢谢，我知道这一点，只是在想是否有什么方法可以有效地实现它。谢谢，我知道这一点，只是在想是否有什么方法可以有效地进行搜索。谢谢你，这是另一种寻找你的有效方法。