Python 合并数据帧时如何合并两个列表列？_Python_Pandas_Merge

Python 合并数据帧时如何合并两个列表列？

python pandas merge

Python 合并数据帧时如何合并两个列表列？,python,pandas,merge,Python,Pandas,Merge,我有两个数据帧： date ids_x ids_y 0 2015-10-13 [978] [978, 12] 1 2015-10-14 [978, 121] [2, 1] df1: date ids 0 2015-10-13 [978] 1 2015-10-14 [978, 121] date ids 0

我有两个数据帧：

   date            ids_x             ids_y
0   2015-10-13    [978]            [978, 12]
1   2015-10-14    [978, 121]       [2, 1]

df1:

       date        ids
0   2015-10-13       [978]
1   2015-10-14  [978, 121]

       date        ids
0   2015-10-13  [978, 12]
1   2015-10-14     [2, 1]

df2:

       date        ids
0   2015-10-13       [978]
1   2015-10-14  [978, 121]

       date        ids
0   2015-10-13  [978, 12]
1   2015-10-14     [2, 1]

当我根据

日期将它们合并时，如下所示：
df = pandas.merge(df1, df2, on='date', sort=False)

我将使用下面的数据帧
：
   date            ids_x             ids_y
0   2015-10-13    [978]            [978, 12]
1   2015-10-14    [978, 121]       [2, 1]

我想从两个列表中合并一个
列，如[978，978，12]
，或者最好删除重复项，并有类似[978，12]
的解决方案：
df = pandas.merge(df1, df2, on='date', sort=False)
df['ids'] = df['ids_x'] + df['ids_y']
df = df.drop(['ids_x','ids_y'], 1)

要合并两个列表，请使用apply
功能：
df['ids'] = df.apply(lambda row: list(set(row['ids'])), axis=1)

您可以将这两列添加到一起以获得要查找的列表，然后使用df.drop（）
和axis=1
删除ids\u x
和ids\u y
列。范例-
df = pd.merge(df1, df2, on='date', sort=False)
df['ids'] = df['ids_x'] + df['ids_y']
df = df.drop(['ids_x','ids_y'],axis=1)

df['ids'] = df['ids'].apply(lambda x: list(set(x)))

演示-
In [65]: df
Out[65]:
         date       ids_x      ids_y
0  2015-10-13       [978]  [978, 12]
1  2015-10-14  [978, 121]     [2, 1]

In [67]: df['ids'] = df['ids_x'] + df['ids_y']

In [68]: df
Out[68]:
         date       ids_x      ids_y               ids
0  2015-10-13       [978]  [978, 12]    [978, 978, 12]
1  2015-10-14  [978, 121]     [2, 1]  [978, 121, 2, 1]

In [70]: df = df.drop(['ids_x','ids_y'],axis=1)

In [71]: df
Out[71]:
         date               ids
0  2015-10-13    [978, 978, 12]
1  2015-10-14  [978, 121, 2, 1]

In [72]: df['ids'] = df['ids'].apply(lambda x: list(set(x)))

In [73]: df
Out[73]:
         date               ids
0  2015-10-13         [978, 12]
1  2015-10-14  [121, 978, 2, 1]

In [79]: df['ids'] = df['ids'].apply(lambda x: np.unique(x))

In [80]: df
Out[80]:
         date               ids
0  2015-10-13         [12, 978]
1  2015-10-14  [1, 2, 121, 978]


如果您还想删除重复的值，并且不关心顺序，则可以使用系列。应用，然后将列表转换为集
，然后返回到列表
。范例-
df = pd.merge(df1, df2, on='date', sort=False)
df['ids'] = df['ids_x'] + df['ids_y']
df = df.drop(['ids_x','ids_y'],axis=1)

df['ids'] = df['ids'].apply(lambda x: list(set(x)))

演示-
In [65]: df
Out[65]:
         date       ids_x      ids_y
0  2015-10-13       [978]  [978, 12]
1  2015-10-14  [978, 121]     [2, 1]

In [67]: df['ids'] = df['ids_x'] + df['ids_y']

In [68]: df
Out[68]:
         date       ids_x      ids_y               ids
0  2015-10-13       [978]  [978, 12]    [978, 978, 12]
1  2015-10-14  [978, 121]     [2, 1]  [978, 121, 2, 1]

In [70]: df = df.drop(['ids_x','ids_y'],axis=1)

In [71]: df
Out[71]:
         date               ids
0  2015-10-13    [978, 978, 12]
1  2015-10-14  [978, 121, 2, 1]

In [72]: df['ids'] = df['ids'].apply(lambda x: list(set(x)))

In [73]: df
Out[73]:
         date               ids
0  2015-10-13         [978, 12]
1  2015-10-14  [121, 978, 2, 1]

In [79]: df['ids'] = df['ids'].apply(lambda x: np.unique(x))

In [80]: df
Out[80]:
         date               ids
0  2015-10-13         [12, 978]
1  2015-10-14  [1, 2, 121, 978]


或者，如注释中所述，如果要使用numpy.unique（）
，您可以将其与系列一起使用。也可以应用-
import numpy as np
df['ids'] = df['ids'].apply(lambda x: np.unique(x))

演示-
In [65]: df
Out[65]:
         date       ids_x      ids_y
0  2015-10-13       [978]  [978, 12]
1  2015-10-14  [978, 121]     [2, 1]

In [67]: df['ids'] = df['ids_x'] + df['ids_y']

In [68]: df
Out[68]:
         date       ids_x      ids_y               ids
0  2015-10-13       [978]  [978, 12]    [978, 978, 12]
1  2015-10-14  [978, 121]     [2, 1]  [978, 121, 2, 1]

In [70]: df = df.drop(['ids_x','ids_y'],axis=1)

In [71]: df
Out[71]:
         date               ids
0  2015-10-13    [978, 978, 12]
1  2015-10-14  [978, 121, 2, 1]

In [72]: df['ids'] = df['ids'].apply(lambda x: list(set(x)))

In [73]: df
Out[73]:
         date               ids
0  2015-10-13         [978, 12]
1  2015-10-14  [121, 978, 2, 1]

In [79]: df['ids'] = df['ids'].apply(lambda x: np.unique(x))

In [80]: df
Out[80]:
         date               ids
0  2015-10-13         [12, 978]
1  2015-10-14  [1, 2, 121, 978]

使用numpy.unique（）。谢谢你抽出时间+1@AlirezaHos请注意，根据我的测试，set
比numpy.unique
快一点。你说得对Numpy.unique
花费了0.16
秒，而set
花费了0.05
秒，用于30000个ID。很快。谢谢你的回答+1.