Python 3.x ValueError：传递的项目数错误2，在对两列执行分组并将组转换为计算计数时，placement意味着1_Python 3.x_Pandas_Pandas Groupby

Python 3.x ValueError：传递的项目数错误2，在对两列执行分组并将组转换为计算计数时，placement意味着1

python-3.x pandas

Python 3.x ValueError：传递的项目数错误2，在对两列执行分组并将组转换为计算计数时，placement意味着1,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我有一个熊猫数据框，它看起来如下所示： df = user_id item_id time location u1 i1 t1. l1 u2 i1 t2 l2 u1 i2 t3 l1 u3 i2 t4 l2 u4 i1 t5 l1 u5 i1 t6 l1 预期产出： df =

我有一个熊猫数据框，它看起来如下所示：

 df =

 user_id  item_id  time  location
  u1      i1        t1.   l1
  u2      i1        t2    l2
  u1      i2        t3    l1
  u3      i2        t4    l2
  u4      i1        t5    l1
  u5      i1        t6    l1

预期产出：

  df =
 user_id  item_id  time  location count
  u1      i1       t1.   l1         3
  u2      i1       t2    l2         1
  u1      i2       t3    l1         1
  u3      i2       t4    l2         1
  u4      i1       t5    l1         3
  u5      i1       t6    l1         3

我只是尝试按itemid和location进行分组，并计算每个组出现的次数

这是代码，它可以工作：

 df.groupby(['item_id', 'location']).size()

但是，我想将此分组附加回df：

为此，我采取了以下措施：

  data.groupby(['item_id', 'customer_zipcode'])['user_id','time'].transform('size')

但是，我得到了以下错误：

 IndexError: Column(s) ['user_id', 'time'] already selected

然后，我做了这个：

 data.groupby(['item_id', 'location'])['user_id','time'].transform('count')

它可以工作，但不能提供所需的输出

我也试过：

   data.groupby(['item_id', 'location']).transform('sum')

但是，这会产生一个不同的错误：

   TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

那么，如何按两列分组，计算发生次数并将其附加回数据帧？

对于我来说，如果在groupby之后只选择一个值，则需要一个新列：

data['count1'] = data.groupby(['item_id', 'location'])['user_id'].transform('size')
data['count2'] = data.groupby(['item_id', 'location'])['user_id'].transform('count')
print (data)
  user_id item_id time location  count1  count2
0      u1      i1  t1.       l1       3       3
1      u2      i1   t2       l2       1       1
2      u1      i2   t3       l1       1       1
3      u3      i2   t4       l2       1       1
4      u4      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

此方法之间存在差异-仅大小计数组，但计数用于具有排除NAN的列的计数值，因此用于测试多个列：

#no missing values, same output
data[['count2','count3']] = data.groupby(['item_id', 'location'])[['user_id', 'time']].transform('count')
print (data)
  user_id item_id time location  count2  count3
0      u1      i1  t1.       l1       3       3
1      u2      i1   t2       l2       1       1
2      u1      i2   t3       l1       1       1
3      u3      i2   t4       l2       1       1
4      u4      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

如果测试多个列的大小如果失败，我想bug或一些人会注意到它没有意义测试多个列，因为不排除NaN，所以所有列都有相同的值：

data[['count2','count3']] = data.groupby(['item_id', 'location'])[['user_id', 'time']].transform('size')

print (data)

索引器：已选择列['user\u id'，'time']

由于可能出现错误，请验证是否正确使用每列：

data['count2'] = data.groupby(['item_id', 'location'])['user_id'].transform('size')
data['count3'] = data.groupby(['item_id', 'location'])[ 'time'].transform('size')
print (data)

  user_id item_id time location  count2  count3
0      u1      i1  t1.       l1       3       3
1      u2      i1  NaN       l2       1       1
2      u1      i2   t3       l1       1       1
3     NaN      i2   t4       l2       1       1
4     NaN      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

对于我来说，如果在groupby之后只选择一个值，则需要一个新列：

data['count1'] = data.groupby(['item_id', 'location'])['user_id'].transform('size')
data['count2'] = data.groupby(['item_id', 'location'])['user_id'].transform('count')
print (data)
  user_id item_id time location  count1  count2
0      u1      i1  t1.       l1       3       3
1      u2      i1   t2       l2       1       1
2      u1      i2   t3       l1       1       1
3      u3      i2   t4       l2       1       1
4      u4      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

此方法之间存在差异-仅大小计数组，但计数用于具有排除NAN的列的计数值，因此用于测试多个列：

#no missing values, same output
data[['count2','count3']] = data.groupby(['item_id', 'location'])[['user_id', 'time']].transform('count')
print (data)
  user_id item_id time location  count2  count3
0      u1      i1  t1.       l1       3       3
1      u2      i1   t2       l2       1       1
2      u1      i2   t3       l1       1       1
3      u3      i2   t4       l2       1       1
4      u4      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

如果测试多个列的大小如果失败，我想bug或一些人会注意到它没有意义测试多个列，因为不排除NaN，所以所有列都有相同的值：

data[['count2','count3']] = data.groupby(['item_id', 'location'])[['user_id', 'time']].transform('size')

print (data)

索引器：已选择列['user\u id'，'time']

由于可能出现错误，请验证是否正确使用每列：

data['count2'] = data.groupby(['item_id', 'location'])['user_id'].transform('size')
data['count3'] = data.groupby(['item_id', 'location'])[ 'time'].transform('size')
print (data)

  user_id item_id time location  count2  count3
0      u1      i1  t1.       l1       3       3
1      u2      i1  NaN       l2       1       1
2      u1      i2   t3       l1       1       1
3     NaN      i2   t4       l2       1       1
4     NaN      i1   t5       l1       3       3
5      u5      i1   t6       l1       3       3

是的，它正在工作。你知道为什么我们需要增加一列而不是时间吗？内部是如何处理时间的？@SumitSidana-我认为在这里它可以处理groupby之后的任何列，为什么它是Indexer或它很奇怪，看起来像bug。如果你同时添加列user_id和time，它对你有用吗？@SumitSidana-我尝试编辑答案以获得更好的解释。是的，谢谢你提供详细的答案。我的数据中没有NAN。所以count2和count3都是一样的。是的，它在工作。你知道为什么我们需要增加一列而不是时间吗？内部是如何处理时间的？@SumitSidana-我认为在这里它可以处理groupby之后的任何列，为什么它是Indexer或它很奇怪，看起来像bug。如果你同时添加列user_id和time，它对你有用吗？@SumitSidana-我尝试编辑答案以获得更好的解释。是的，谢谢你提供详细的答案。我的数据中没有NAN。所以count2和count3都是一样的。