Python 如何获得两个熊猫系列文本列的交点？_Python_Python 3.x_Pandas

Python 如何获得两个熊猫系列文本列的交点？

python python-3.x pandas

Python 如何获得两个熊猫系列文本列的交点？,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有两个熊猫系列的文本栏，我怎样才能得到它们的交集 print(df) 0 {this, is, good} 1 {this, is, not, good} print(df1) 0 {this, is} 1 {good, bad} 我正在寻找下面这样的输出 print(df2) 0 {this, is} 1 {good} 我试过了，但它还是回来了 df.apply(lambda x: x.intersection(df1)) TypeError: unhashable

我有两个熊猫系列的文本栏，我怎样才能得到它们的交集

print(df)

0  {this, is, good}
1  {this, is, not, good}

print(df1)

0  {this, is}
1  {good, bad}

我正在寻找下面这样的输出

print(df2)

0  {this, is}
1  {good}

我试过了，但它还是回来了

df.apply(lambda x: x.intersection(df1))
TypeError: unhashable type: 'set'

这种方法适合我

import pandas as pd
import numpy as np

data = np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}])
data1 = np.array([{'this', 'is'},{'good', 'bad'}])
df = pd.Series(data)
df1 = pd.Series(data1)

df2 = pd.Series([df[i] & df1[i] for i in xrange(df.size)])
print(df2)

看起来是一个简单的逻辑：

s1 = pd.Series([{'this', 'is', 'good'}, {'this', 'is', 'not', 'good'}])
s2 = pd.Series([{'this', 'is'}, {'good', 'bad'}])
s1 - (s1 - s2)  
#Out[122]: 
#0    {this, is}
#1        {good}
#dtype: object

与上面类似，除非您希望将所有内容都保存在一个数据帧中

Current df:
df = pd.DataFrame({0: np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}]), 1: np.array([{'this', 'is'},{'good', 'bad'}])})

Intersection of series 0 & 1
df[2] = df.apply(lambda x: x[0] & x[1], axis=1)

我感谢以上的回答。这里有一个简单的例子来解决同样的问题，如果你有数据帧（我猜，在查看了变量名后，比如

df

df1

，你问了数据帧）

这

df.apply（lambda行：行[0]。交叉点（df1.loc[row.name][0]），axis=1）

将执行此操作。让我们看看我是如何找到解决方案的

当时的答案对我很有帮助

我是如何达到上述解决方案的？ df.apply（lambda x:print（x.name），axis=1） 0 1. 0无 1无数据类型：对象 >>> >>>df.loc[0] 设置{this，is，good} 名称：0，数据类型：对象 >>> >>>df.apply（lambda行：打印（行[0]），轴=1） {'this'，'is'，'good'} {'not'，'this'，'is'，'good'} 0无 1无数据类型：对象 >>> >>>df.apply（lambda行：打印（类型（行[0]）），轴=1） 0无 1无数据类型：对象 >>>df.apply（lambda行：打印（类型（行[0]），df1.loc[row.name]），轴=1）设置{this，is} 名称：0，数据类型：对象集合{good} 名称：1，数据类型：object 0无 1无数据类型：对象 >>>应用（lambda行：打印（类型（行[0]），类型（df1.loc[row.name]）），轴=1） 0无 1无数据类型：对象 >>>df.apply（lambda行：打印（类型（行[0]），类型（df1.loc[row.name][0]）），轴=1） 0无 1无数据类型：对象 >>>

它工作正常，但在我庞大的数据集上运行缓慢。谢谢，谢谢。我怎样才能用上述方法实现联盟？我一直试图用我自己的方式回答这个问题，保留你在问题中提到的观点

df

df2

是我根据名称猜测的数据帧，但答案是按系列的。所以我想也使用

intersection（）

来回答使用数据帧的问题。

>>> import pandas as pd

>>> 
>>> df = pd.DataFrame({
...     "set": [{"this", "is", "good"}, {"this", "is", "not", "good"}]
... })
>>> 
>>> df
                     set
0       {this, is, good}
1  {not, this, is, good}
>>> 
>>> df1 = pd.DataFrame({
...     "set": [{"this", "is"}, {"good", "bad"}]
... })
>>> 
>>> df1
           set
0   {this, is}
1  {bad, good}
>>>
>>> df.apply(lambda row: row[0].intersection(df1.loc[row.name][0]), axis=1)
0    {this, is}
1        {good}
dtype: object
>>>

>>> df.apply(lambda x: print(x.name), axis=1)
0
1
0    None
1    None
dtype: object
>>> 
>>> df.loc[0]
set    {this, is, good}
Name: 0, dtype: object
>>> 
>>> df.apply(lambda row: print(row[0]), axis=1)
{'this', 'is', 'good'}
{'not', 'this', 'is', 'good'}
0    None
1    None
dtype: object
>>> 
>>> df.apply(lambda row: print(type(row[0])), axis=1)
<class 'set'>
<class 'set'>
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), df1.loc[row.name]), axis=1)
<class 'set'> set    {this, is}
Name: 0, dtype: object
<class 'set'> set    {good}
Name: 1, dtype: object
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name])), axis=1)
<class 'set'> <class 'pandas.core.series.Series'>
<class 'set'> <class 'pandas.core.series.Series'>
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name][0])), axis=1)
<class 'set'> <class 'set'>
<class 'set'> <class 'set'>
0    None
1    None
dtype: object
>>>