python\u表计数频率在一列中_Python_Pandas

python\u表计数频率在一列中

python pandas

python\u表计数频率在一列中,python,pandas,Python,Pandas,我还是Python pandas的pivot_表的新手，想问一种方法来计算一列中值的频率，该列也链接到另一个ID列 import pandas as pd df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3], 'Product':['A', 'A', 'A', 'B', 'B','A', 'B'] }) 对于输出，我想得到如下结果： Pr

我还是Python pandas的pivot_表的新手，想问一种方法来计算一列中值的频率，该列也链接到另一个ID列

import pandas as pd
df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3],
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B']
                  })

对于输出，我想得到如下结果：

                Product
                A      B
Account_number           
      1         2      0
      2         1      2
      3         1      1

df.pivot_table(index='Account_number',
               columns='Product',
               aggfunc=len,
               fill_value=0)

到目前为止，我尝试了以下代码：

df.pivot_table(rows = 'Account_number', cols= 'Product', aggfunc='count')

这个代码给了我两个相同的东西。上面的代码有什么问题？我问这个问题的部分原因是这个数据帧只是一个例子。我正在处理的真实数据有数万个账号。提前谢谢你的帮助

您需要将

aggfunc

指定为

len

：

In [11]: df.pivot_table(index='Account_number', columns='Product', 
                        aggfunc=len, fill_value=0)
Out[11]:
Product         A  B
Account_number
1               2  0
2               1  2
3               1  1

它看起来像count，正在计算每个列的实例（

账号和产品），我不清楚这是否是一个bug…
在新版本的Pandas中，需要稍微修改。我花了一些时间来弄清楚，所以我想在这里加上它，以便有人可以直接使用它
df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
               fill_value=0)

您可以使用count
df.pivot\u表（index='Account\u number'，columns='Product'，aggfunc='count'）
解决方案：使用aggfunc='size'

与此页面上的所有其他答案一样，使用aggfunc=len
或aggfunc='count'
将不适用于三列以上的数据帧。默认情况下，pandas会将此aggfunc
应用于索引
或列
参数中未找到的所有列
例如，如果我们在原始数据框中有两个以上的列，定义如下：
df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3], 
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B'], 
                   'Price': [10] * 7,
                   'Quantity': [100] * 7})

输出：
   Account_number Product  Price  Quantity
0               1       A     10       100
1               1       A     10       100
2               2       A     10       100
3               2       B     10       100
4               2       B     10       100
5               3       A     10       100
6               3       B     10       100

                  Price    Quantity   
Product            A  B        A  B
Account_number                     
1                  2  0        2  0
2                  1  2        1  2
3                  1  1        1  1

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1

如果将当前解决方案应用于此数据帧，您将获得以下结果：
                Product
                A      B
Account_number           
      1         2      0
      2         1      2
      3         1      1

df.pivot_table(index='Account_number',
               columns='Product',
               aggfunc=len,
               fill_value=0)

输出：
   Account_number Product  Price  Quantity
0               1       A     10       100
1               1       A     10       100
2               2       A     10       100
3               2       B     10       100
4               2       B     10       100
5               3       A     10       100
6               3       B     10       100

                  Price    Quantity   
Product            A  B        A  B
Account_number                     
1                  2  0        2  0
2                  1  2        1  2
3                  1  1        1  1

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1

解决方案
相反，请使用aggfunc='size'
。由于size
总是为每一列返回相同的数字，pandas不会在每一列上调用它，而是只调用一次
df.pivot_table(index='Account_number', 
               columns='Product',
               aggfunc='size',
               fill_value=0)

输出：
   Account_number Product  Price  Quantity
0               1       A     10       100
1               1       A     10       100
2               2       A     10       100
3               2       B     10       100
4               2       B     10       100
5               3       A     10       100
6               3       B     10       100

                  Price    Quantity   
Product            A  B        A  B
Account_number                     
1                  2  0        2  0
2                  1  2        1  2
3                  1  1        1  1

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1

@安迪·海登，+1。我不认为这是一个bug，但我希望行为更加一致，请参见：df.pivot\u表（rows='Account\u number'，cols='Product'，aggfunc=sum，fill\u value=0）
@CTZhu我认为这可能是一个bug（您不希望使用的列包含在聚合中，事实上它们对sum没有任何意义！）InvalidIndexError:重新索引仅对唯一值的索引对象有效
此页面上的所有答案对超过3列的数据帧无效。惯用的解决方案是使用aggfunc='size'
。更详细的情况是，这里有一个bug，默认情况下，pandas会将这个aggfunc应用到索引或列参数中找不到的所有列上。“>>这感觉像是个bug