Python 基于另一个数据帧对一个数据帧进行切片
我根据ID列表创建了以下熊猫Python 基于另一个数据帧对一个数据帧进行切片,python,pandas,dataframe,Python,Pandas,Dataframe,我根据ID列表创建了以下熊猫DataFrame In [8]: df = pd.DataFrame({'groups' : [1,2,3,4], 'id' : ["[1,3]","[2]","[5]","[4,6,7]"]}) Out[9]: groups id 0 1 [1,3] 1 2 [2] 2 3 [5] 3 4 [4,6,7] 还有另一个DataFrame,如
DataFrame
In [8]: df = pd.DataFrame({'groups' : [1,2,3,4],
'id' : ["[1,3]","[2]","[5]","[4,6,7]"]})
Out[9]:
groups id
0 1 [1,3]
1 2 [2]
2 3 [5]
3 4 [4,6,7]
还有另一个DataFrame
,如下所示
In [12]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
我需要获取每个组的路径值。
例如
我不确定这是不是最好的方法,但对我来说很有效。顺便提一下,只有在df 1中创建的id变量没有“”标记时,这才有效,即作为列表,而不是字符串
import itertools
df = pd.DataFrame({'groups' : [1,2,3,4],
'id' : [[1,3],[2],[5],[4,6,7]]})
df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
paths = [[] for group in df.groups.unique()]
for x in df.index:
paths[x].extend(itertools.chain(*[list(df2[df2.id == int(y)]['path']) for y in df.id[x]]))
df['paths'] = pd.Series(paths)
df
也许有一种更简洁的方法可以做到这一点,但在某种程度上它是一种奇怪的数据结构。给出以下输出
groups id paths
0 1 [1, 3] [p1,p2,p3,p4, p1,p5,p5,p7]
1 2 [2] [p1,p2,p1]
2 3 [5] [p1,p2]
3 4 [4, 6, 7] [p1,p2,p3,p3, p1, p2,p3,p4]
您不应该将
数据框构建为嵌入列表
对象。相反,根据ID的长度重复分组,然后使用pandas.merge
,如下所示:
In [143]: groups = list(range(1, 5))
In [144]: ids = [[1, 3], [2], [5], [4, 6, 7]]
In [145]: df = DataFrame({'groups': np.repeat(groups, list(map(len, ids))), 'id': reduce(lambda
x, y: x + y, ids)})
In [146]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p
2,p3,p4"]})
In [147]: df
Out[147]:
groups id
0 1 1
1 1 3
2 2 2
3 3 5
4 4 4
5 4 6
6 4 7
[7 rows x 2 columns]
In [148]: df2
Out[148]:
id path
0 1 p1,p2,p3,p4
1 2 p1,p2,p1
2 3 p1,p5,p5,p7
3 4 p1,p2,p3,p3
4 5 p1,p2
5 6 p1
6 7 p2,p3,p4
[7 rows x 2 columns]
In [149]: pd.merge(df, df2, on='id', how='outer')
Out[149]:
groups id path
0 1 1 p1,p2,p3,p4
1 1 3 p1,p5,p5,p7
2 2 2 p1,p2,p1
3 3 5 p1,p2
4 4 4 p1,p2,p3,p3
5 4 6 p1
6 4 7 p2,p3,p4
[7 rows x 3 columns]
In [143]: groups = list(range(1, 5))
In [144]: ids = [[1, 3], [2], [5], [4, 6, 7]]
In [145]: df = DataFrame({'groups': np.repeat(groups, list(map(len, ids))), 'id': reduce(lambda
x, y: x + y, ids)})
In [146]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p
2,p3,p4"]})
In [147]: df
Out[147]:
groups id
0 1 1
1 1 3
2 2 2
3 3 5
4 4 4
5 4 6
6 4 7
[7 rows x 2 columns]
In [148]: df2
Out[148]:
id path
0 1 p1,p2,p3,p4
1 2 p1,p2,p1
2 3 p1,p5,p5,p7
3 4 p1,p2,p3,p3
4 5 p1,p2
5 6 p1
6 7 p2,p3,p4
[7 rows x 2 columns]
In [149]: pd.merge(df, df2, on='id', how='outer')
Out[149]:
groups id path
0 1 1 p1,p2,p3,p4
1 1 3 p1,p5,p5,p7
2 2 2 p1,p2,p1
3 3 5 p1,p2
4 4 4 p1,p2,p3,p3
5 4 6 p1
6 4 7 p2,p3,p4
[7 rows x 3 columns]