Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:在文本列中搜索关键字列表并对其进行标记_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫:在文本列中搜索关键字列表并对其进行标记

Python 熊猫:在文本列中搜索关键字列表并对其进行标记,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一袋单词作为列表格式的元素。我试图搜索这些单词中的每一个是否都出现在pandas数据框中,前提是它“开始于”列表中的元素。我尝试了“startswith”和“contains”进行比较 代码: import pandas as pd # list of words to search for searchwords = ['harry','harry potter','secret garden'] # Data l1 = [1, 2, 3,4,5] l2 = ['Harry Potter

我有一袋单词作为列表格式的元素。我试图搜索这些单词中的每一个是否都出现在pandas数据框中,前提是它“开始于”列表中的元素。我尝试了“startswith”和“contains”进行比较

代码:

import pandas as pd
# list of words to search for
searchwords = ['harry','harry potter','secret garden']

# Data
l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
      'Harry Potter is very famous',
      'I enjoyed reading Harry Potter series',
      'LOTR is also a great book along',
      'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()

# Preview df:
    id  text
0   1   harry potter is a great book
1   2   harry potter is very famous
2   3   i enjoyed reading harry potter series
3   4   lotr is also a great book along
4   5   have you read secret garden as well?
试试#1:

试试#2: 当我运行此命令时,它不返回任何内容。为什么呢?我做错了什么?当我搜索“harry”作为单曲时,它会工作,但当我传入元素列表时,它不会工作

df[df['text'].str.startswith('harry')] # works with single string.
df[df['text'].str.startswith('|'.join(searchwords))] # returns nothing! 

因为
startswith
接受str而不接受regex,所以使用
str.findall

df[df['text'].str.findall('^(?:'+'|'.join(searchwords) + ')').apply(len) > 0]
输出

   id                          text
0   1  harry potter is a great book
1   2   harry potter is very famous

startswith
元组一起使用

Ex:

searchwords = ['harry','harry potter','secret garden']

# Data
l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
      'Harry Potter is very famous',
      'I enjoyed reading Harry Potter series',
      'LOTR is also a great book along',
      'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()

print(df[df['text'].str.startswith(tuple(searchwords))] )
   id                          text
0   1  harry potter is a great book
1   2   harry potter is very famous
输出:

searchwords = ['harry','harry potter','secret garden']

# Data
l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
      'Harry Potter is very famous',
      'I enjoyed reading Harry Potter series',
      'LOTR is also a great book along',
      'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()

print(df[df['text'].str.startswith(tuple(searchwords))] )
   id                          text
0   1  harry potter is a great book
1   2   harry potter is very famous

您可以在
startswith
函数中传递一个元组来检查多个单词 看到这个了吗

在你的情况下,你可以这样做

df['text'].str.startswith(tuple(searchwords))

Out:
0     True
1     True
2    False
3    False
4    False
Name: text, dtype: bool

感谢您的提示解决方案
^(?:
startwith是否有?好奇如何使用endswith?不知道startwith不能使用RegeInteresting+1,你能解释startwith如何解释元组吗?这很有趣。这只适用于startwith和endswith。永远不会想到使用元组!谢谢,这很好,但为什么只有元组,我的意思是为什么不使用j我们来看看这个列表,是因为它是这样写的,还是元组有一些我不知道的特殊属性,或者我在这里遗漏了什么。@rakeshit这就很清楚了。。