Python 熊猫:检查B列中包含的A列中的值
我在df1中有100个关键字,在df2中有10000篇文章。我想计算有多少文章包含某个关键字。例如,大约有20篇文章包含关键词“apple” 我尝试使用df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗Python 熊猫:检查B列中包含的A列中的值,python,pandas,Python,Pandas,我在df1中有100个关键字,在df2中有10000篇文章。我想计算有多少文章包含某个关键字。例如,大约有20篇文章包含关键词“apple” 我尝试使用df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗 df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords']) df2=pd.DataFrame(['apple is good for health','mac is anot
df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords'])
df2=pd.DataFrame(['apple is good for health','mac is another pc','today is sunday','Star wars pc game','ios is a system,lg is not','lg is a japan company '],columns=['article'])
结果:
1 artricl contain "apple"
1 article contain 'mac'
2 article contain 'pc'
1 article contain "ios"
2 article contain 'lg'
我认为需要使用带有sum
的布尔级数来计算True
s,这是类似1
s的过程,对于所有关键字使用列表理解
和数据帧构造函数:
L = [(x, df2['article'].str.contains(x).sum()) for x in df1['keywords']]
#alternative solution
#L = [(x, sum(x in article for article in df2['article'])) for x in df1['keywords']]
df3 = pd.DataFrame(L, columns=['keyword', 'count'])
print (df3)
keyword count
0 apple 1
1 mac 1
2 pc 2
3 ios 1
4 lg 2
如果只需要打印输出:
for x in df1['keywords']:
count = df2['article'].str.contains(x).sum()
#another solution if no NaNs with sum, generator and check membership by in
#count = sum(x in article for article in df2['article'])
print ('{} article contain "{}"'.format(count, x))
1 article contain "apple"
1 article contain "mac"
2 article contain "pc"
1 article contain "ios"
2 article contain "lg"
@安迪亚丹-不客气!如果我的答案有帮助,别忘了——点击答案旁边的复选标记,将其从灰色变为填充。谢谢