Python 熊猫：如何检查列表类型列是否在dataframe中_Python_Pandas

Python 熊猫：如何检查列表类型列是否在dataframe中

python pandas

Python 熊猫：如何检查列表类型列是否在dataframe中,python,pandas,Python,Pandas,如何从列表列创建新的列表列我的数据帧： id x list_id 1 20 [2, 4] 2 10 [1, 3] 3 10 [1] 4 30 [1, 2] 我想要的是： id x list_id list_x 1 20 [2, 4] [10, 30] 2 10 [1, 3] [20, 10] 3 10 [1] [20] 4 30 [1,

如何从列表列创建新的列表列

我的数据帧：

id    x    list_id
1     20   [2, 4]
2     10   [1, 3]
3     10   [1]
4     30   [1, 2]

我想要的是：

id    x    list_id    list_x
1     20   [2, 4]     [10, 30]
2     10   [1, 3]     [20, 10]
3     10   [1]        [20]
4     30   [1, 2]     [20, 10]

我的第一个想法是迭代每一行，然后检查id是否在列表中

for index, row in df.iterrows():
  if ( df['id'].isin(row['list_id']) ):
     do_somthing

但是它不起作用，任何建议

使用列表理解：

df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]

具有虚拟数据的完整示例：

import pandas as pd

data= {
    'id': [1,2,3,4],
    'x': [20,10,10,30],
    'list_id': [[2,4],[1,3],[1],[1,2]],
}

df = pd.DataFrame(data)

df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]

输出

print df

  list_id   x    list_x
1  [2, 4]  20  [10, 30]
2  [1, 3]  10  [20, 10]
3     [1]  10      [20]
4  [1, 2]  30  [20, 10]

创造性解决方案
将

numpy

对象数组与

set

元素一起使用

i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])

df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))

   id   x list_id    list_x
0   1  20  [2, 4]  [10, 30]
1   2  10  [1, 3]  [20, 10]
2   3  10     [1]      [20]
3   4  30  [1, 2]  [20, 10]

i=np.array（[set（[x]），用于df.id.values.tolist（）中的x）
x=np.empty（i.shape，dtype=object）
x[：]=[[x]表示df.x.values.tolist（）中的x
y=np.空的（x）
y、 填充（[]）
j=np.array（[在df.list\u id.values.tolist（）中为x设置（x）]）
分配（list_x=np.where）（i）您是如何创建它的？这取决于您想要做什么。很明显，您正在尝试确定“id”是否在“list_id”中，但不清楚您想要执行什么操作我必须从list_id列创建一个新的列
%timeit df.assign(list_x=[df.x[df['id'].isin(l)].values for l in df.list_id])

1000 loops, best of 3: 1.21 ms per loop

%%timeit 
i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])

df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))

1000 loops, best of 3: 371 µs per loop