Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python—在数据帧中查找具有相同id的元素并将其分组_Python_Python 3.x_Pandas_List - Fatal编程技术网

Python—在数据帧中查找具有相同id的元素并将其分组

Python—在数据帧中查找具有相同id的元素并将其分组,python,python-3.x,pandas,list,Python,Python 3.x,Pandas,List,我有这样一个数据框: id value1 value2 0 1 1 1 2 3 2 1 4 3 1 5 4 2 1 id value1 value2 count 0 1 1,4,5 3 1 2 3,1 2 我希望它是这样的: id value1 value2 0 1 1 1 2 3 2 1 4

我有这样一个数据框:

id  value1 value2
 0     1     1
 1     2     3
 2     1     4
 3     1     5
 4     2     1
id  value1 value2  count
 0     1     1,4,5   3
 1     2     3,1     2
我希望它是这样的:

id  value1 value2
 0     1     1
 1     2     3
 2     1     4
 3     1     5
 4     2     1
id  value1 value2  count
 0     1     1,4,5   3
 1     2     3,1     2
通过连接和使用聚合,但必须将列转换为字符串:

tups = [('value2', lambda x: ','.join(x.astype(str))), ('count', 'size')]
df1 = df.groupby('value1')['value2'].agg(tups).reset_index()
print (df1)
   value1 value2  count
0       1  1,4,5      3
1       2    3,1      2
备选方案:

tups = [('value2', ','.join), ('count', 'size')]
df1 = df['value2'].astype(str).groupby(df['value1']).agg(tups).reset_index()
通过连接和使用聚合,但必须将列转换为字符串:

tups = [('value2', lambda x: ','.join(x.astype(str))), ('count', 'size')]
df1 = df.groupby('value1')['value2'].agg(tups).reset_index()
print (df1)
   value1 value2  count
0       1  1,4,5      3
1       2    3,1      2
备选方案:

tups = [('value2', ','.join), ('count', 'size')]
df1 = df['value2'].astype(str).groupby(df['value1']).agg(tups).reset_index()

类似的方法也可以奏效:

In [2468]: df.value2 = df['value2'].apply(str)

In [2494]: res = df.groupby('value1')['value2'].apply(lambda x:','.join(x)).reset_index()

In [2498]: res['count'] = df.groupby('value1').size().reset_index()[0]

In [2499]: res
Out[2499]: 
   value1 value2  count
0       1  1,4,5      3
1       2    3,1      2

类似的方法也可以奏效:

In [2468]: df.value2 = df['value2'].apply(str)

In [2494]: res = df.groupby('value1')['value2'].apply(lambda x:','.join(x)).reset_index()

In [2498]: res['count'] = df.groupby('value1').size().reset_index()[0]

In [2499]: res
Out[2499]: 
   value1 value2  count
0       1  1,4,5      3
1       2    3,1      2

列value2应该是什么数据类型?字符串?看起来您希望按value1分组,而不是按id分组。列value2应该是什么数据类型?字符串?而且它看起来像是要按值1分组,而不是按id分组。它工作得很好。谢谢它工作得很好。谢谢