Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:检查B列中包含的A列中的值_Python_Pandas - Fatal编程技术网

Python 熊猫:检查B列中包含的A列中的值

Python 熊猫:检查B列中包含的A列中的值,python,pandas,Python,Pandas,我在df1中有100个关键字,在df2中有10000篇文章。我想计算有多少文章包含某个关键字。例如,大约有20篇文章包含关键词“apple” 我尝试使用df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗 df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords']) df2=pd.DataFrame(['apple is good for health','mac is anot

我在df1中有100个关键字,在df2中有10000篇文章。我想计算有多少文章包含某个关键字。例如,大约有20篇文章包含关键词“apple”

我尝试使用df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗

df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords'])


df2=pd.DataFrame(['apple is good for health','mac is another pc','today is sunday','Star wars pc game','ios is a system,lg is not','lg is a japan company '],columns=['article'])
结果:

1 artricl contain "apple"
1 article contain 'mac'
2 article contain 'pc'
1 article contain "ios"
2 article contain 'lg'
我认为需要使用带有
sum
的布尔级数来计算
True
s,这是类似
1
s的过程,对于所有
关键字
使用
列表理解
数据帧
构造函数:

L = [(x, df2['article'].str.contains(x).sum()) for x in df1['keywords']]
#alternative solution
#L = [(x, sum(x in article for article in df2['article'])) for x in df1['keywords']]
df3 = pd.DataFrame(L, columns=['keyword', 'count'])
print (df3)
  keyword  count
0   apple      1
1     mac      1
2      pc      2
3     ios      1
4      lg      2
如果只需要打印输出:

for x in df1['keywords']:
    count =  df2['article'].str.contains(x).sum()
    #another solution if no NaNs with sum, generator and check membership by in
    #count =  sum(x in article for article in df2['article'])
    print ('{} article contain "{}"'.format(count, x))

1 article contain "apple"
1 article contain "mac"
2 article contain "pc"
1 article contain "ios"
2 article contain "lg"

@安迪亚丹-不客气!如果我的答案有帮助,别忘了——点击答案旁边的复选标记,将其从灰色变为填充。谢谢