Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
是什么导致python中groupby和transform.count()操作的计数错误_Python_Pandas - Fatal编程技术网

是什么导致python中groupby和transform.count()操作的计数错误

是什么导致python中groupby和transform.count()操作的计数错误,python,pandas,Python,Pandas,我正在分组并依靠我的数据帧 这是我从.descripe()方法得到的结果: 而所有其他指标均为4。事实上,这个组中只有4个条形码,所以计数应该是5。怎么可能计数是5 invoice_number barcode OFF1540673 4054673005837 count 5.0 mean

我正在分组并依靠我的数据帧

这是我从.descripe()方法得到的结果:

而所有其他指标均为4。事实上,这个组中只有4个条形码,所以计数应该是5。怎么可能计数是5

invoice_number        barcode
OFF1540673            4054673005837  count                                   5.0
                                     mean                                    4.0
                                     std                                     0.0
                                     min                                     4.0
                                     25%                                     4.0
                                     50%                                     4.0
                                     75%                                     4.0
                                     max                                     4.0
                      4054673034394  count                                   5.0
                                     mean                                    4.0
                                     std                                     0.0
                                     min                                     4.0
                                     25%                                     4.0
                                     50%                                     4.0
                                     75%                                     4.0
                                     max                                     4.0
                      4054673238488  count                                   5.0
                                     mean                                    4.0
                                     std                                     0.0
                                     min                                     4.0
                                     25%                                     4.0
                                     50%                                     4.0
                                     75%                                     4.0
                                     max                                     4.0
                      4054673238822  count                                   5.0
                                     mean                                    4.0
                                     std                                     0.0
                                     min                                     4.0
                                     25%                                     4.0
                                     50%                                     4.0
                                     75%                                     4.0
                                     max                                     4.0
更新

原始数据集

              invoice_number  barcode
327378            OFF1540673  4054673238488
327379            OFF1540673  4054673034394
327380            OFF1540673  4054673238822
327381            OFF1540673  4054673005837
327382            OFF1540673  4054673238488
327383            OFF1540673  4054673034394
327384            OFF1540673  4054673238822
327385            OFF1540673  4054673005837
327386            OFF1540673  4054673238488
327387            OFF1540673  4054673034394
327388            OFF1540673  4054673238822
327389            OFF1540673  4054673005837
327390            OFF1540673  4054673238488
327391            OFF1540673  4054673034394
327392            OFF1540673  4054673238822
327393            OFF1540673  4054673005837
327394            OFF1540673  4054673238488
327395            OFF1540673  4054673034394
327396            OFF1540673  4054673238822
327397            OFF1540673  4054673005837
两个列的数据类型均为“对象”

这是分组的命令

打印数据。分组方式(['invoice\u number','barcode'])['invoice\u number']。描述()

更新:无法使用提供的数据集再现您的问题:

In [16]: df.groupby(['invoice_number','barcode'])['invoice_number'].describe()
Out[16]:
invoice_number  barcode
OFF1540673      4054673005837  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673034394  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673238488  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673238822  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
Name: invoice_number, dtype: object

In [17]: df.groupby(['invoice_number','barcode'])['invoice_number'].count()
Out[17]:
invoice_number  barcode
OFF1540673      4054673005837    5
                4054673034394    5
                4054673238488    5
                4054673238822    5
Name: invoice_number, dtype: int64

该组包含4个条形码(见上文),但计数为5而不是4。这真的与南斯有关吗?@Jabb,谢谢你的数据集!我无法复制您的问题-请参阅更新在您的复制,计数是5,而有4个条形码。计数不是应该是4吗?啊!我在寻找不同的东西。。。我想知道每个订单的条形码的数量number@Jabb,您的意思是:
df.groupby('invoice_number')['barcode'].nunique()
?您的数据在
描述之前看起来如何?我尝试模拟它
df=pd.DataFrame({'a':[4]*10,'b':['a']*5+['b']*5})
-似乎所有的值都是
4
,每个
组的
length
都是
5
(如果
NaN
s,则更多)
print(df.groupby('b'['a'].description())
In [16]: df.groupby(['invoice_number','barcode'])['invoice_number'].describe()
Out[16]:
invoice_number  barcode
OFF1540673      4054673005837  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673034394  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673238488  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
                4054673238822  count              5
                               unique             1
                               top       OFF1540673
                               freq               5
Name: invoice_number, dtype: object

In [17]: df.groupby(['invoice_number','barcode'])['invoice_number'].count()
Out[17]:
invoice_number  barcode
OFF1540673      4054673005837    5
                4054673034394    5
                4054673238488    5
                4054673238822    5
Name: invoice_number, dtype: int64