如何使用python比较两列并计算数据帧中有多少相同的项/字符串?
如何使用python比较两列并计算数据帧中有多少相同的项/字符串? 例如:如何使用python比较两列并计算数据帧中有多少相同的项/字符串?,python,pandas,dataframe,Python,Pandas,Dataframe,如何使用python比较两列并计算数据帧中有多少相同的项/字符串? 例如: row | column A | column B | ============================ 1 | ['NNP', | ['NNP', | | 'NNP', | 'NNP', | | 'NNP', | 'VB', | | 'NNP', | 'NN', | | 'CC', | 'NN', | | 'RB',
row | column A | column B |
============================
1 | ['NNP', | ['NNP', |
| 'NNP', | 'NNP', |
| 'NNP', | 'VB', |
| 'NNP', | 'NN', |
| 'CC', | 'NN', |
| 'RB', | 'Z'] |
| 'NN', | |
| 'Z', | |
2 | ['NNP', | ['NNP', |
| 'VB', | 'NN', |
| 'NN', | 'VB'] |
| 'NN', | |
| 'Z'] | |
我想得到的是:
row | column A | column B | count_same_string
==============================================
1 | ['NNP', | ['NNP', | 4
| 'NNP', | 'NNP', |
| 'NNP', | 'VB', |
| 'NNP', | 'NN', |
| 'CC', | 'NN', |
| 'RB', | 'Z'] |
| 'NN', | |
| 'Z', | |
2 | ['NNP', | ['NNP', |2
| 'VB', | 'NN', |
| 'NN', | 'RB'] |
| 'NN', | |
| 'Z'] | |
您可以使用以下代码实现这一点
df['count_same_string'] = df.apply(lambda row: len(set(row['column A']).intersection(row['column B'])))
您可以使用列表理解修改答案并检索长度:
from collections import Counter
df = pd.DataFrame({"column A":[["NNP","NNP","NNP","NNP","CC","RB","NN","Z"]],
"column B":[["NNP","NNP","VB","NN","NN","Z"]]})
df["result"] = [len(list((Counter(a) & Counter(b)).elements()))
for a,b in zip(df["column A"], df["column B"])]
print (df)
column A column B result
0 [NNP, NNP, NNP, NNP, CC, RB, NN, Z] [NNP, NNP, VB, NN, NN, Z] 4
你是如何得到第一行的答案的
VB
和RB
不应该是相同的??