Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python ';楠';在数据帧上进行分组和字符串连接之后_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python ';楠';在数据帧上进行分组和字符串连接之后

Python ';楠';在数据帧上进行分组和字符串连接之后,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有这样一个数据帧: name | weekday | count Peter | Friday | {16, 17, 9, 10, 15} Peter | Friday | {10, 11, 14} Peter | Friday | {16, 17, 11, 12, 15} Bob | Friday | {10} Bob | Friday | {9, 10, 11, 12, 13} Bob | Friday | {9, 10, 11, 14, 15} n

我有这样一个数据帧:

name  | weekday | count 
Peter | Friday  | {16, 17, 9, 10, 15}
Peter | Friday  | {10, 11, 14}  
Peter | Friday  | {16, 17, 11, 12, 15}  
Bob   | Friday  | {10}
Bob   | Friday  | {9, 10, 11, 12, 13}
Bob   | Friday  | {9, 10, 11, 14, 15}

name  | weekday | intersection 
Peter | Friday  | 
Bob   | Friday  | 10

name  | weekday | intersection 
Peter | Friday  | Nan
Bob   | Friday  | 10
我想按名称和工作日分组,添加一列新的交集
count
,如下所示:

name  | weekday | count 
Peter | Friday  | {16, 17, 9, 10, 15}
Peter | Friday  | {10, 11, 14}  
Peter | Friday  | {16, 17, 11, 12, 15}  
Bob   | Friday  | {10}
Bob   | Friday  | {9, 10, 11, 12, 13}
Bob   | Friday  | {9, 10, 11, 14, 15}

name  | weekday | intersection 
Peter | Friday  | 
Bob   | Friday  | 10

name  | weekday | intersection 
Peter | Friday  | Nan
Bob   | Friday  | 10
对于无交叉点的情况,应该返回空字符串,下面是我使用的代码:

df.groupby(['name','weekday']).apply(lambda x: pd.Series({'intersection': ", ".join("{0}".format(n) for n in sorted(list(set.intersection(*x['count']))))})).reset_index()
但我得到的结果是这样的:

name  | weekday | count 
Peter | Friday  | {16, 17, 9, 10, 15}
Peter | Friday  | {10, 11, 14}  
Peter | Friday  | {16, 17, 11, 12, 15}  
Bob   | Friday  | {10}
Bob   | Friday  | {9, 10, 11, 12, 13}
Bob   | Friday  | {9, 10, 11, 14, 15}

name  | weekday | intersection 
Peter | Friday  | 
Bob   | Friday  | 10

name  | weekday | intersection 
Peter | Friday  | Nan
Bob   | Friday  | 10
我在空列表上尝试了
'.join()
,它工作并返回空字符串,但在使用group by后不起作用,我不知道它为什么这样做以及如何解决它

通过“stringify”和join找到交集:

从functools导入reduce
def get_交叉点(s:pd.系列)->str:
相交=减少(λa,b:a.相交(b),s.iloc[1:],s.iat[0])
返回“,”.join([str(x)表示相交中的x])
交叉点=(df.groupby(['name','weekday'])['count']
.agg(获取交叉点)
.rename('交叉点')
.reset_index()
)
这给了你:

打印(交叉点)
名称工作日交叉口
星期五10:00
1彼得星期五

如果您处理的是重叠很少的大型数据集,
while len(intersect)>0
循环可能比
reduce
更好,以避免不必要的处理/工作

我的数据帧实际上比示例更大,对于其他一些行,当交叉点为空时,它将返回空字符串而不是Nan,你知道它为什么这样做吗?你可以在后面用
'
填充Na。