是什么导致python中groupby和transform.count()操作的计数错误
我正在分组并依靠我的数据帧 这是我从.descripe()方法得到的结果: 而所有其他指标均为4。事实上,这个组中只有4个条形码,所以计数应该是5。怎么可能计数是5是什么导致python中groupby和transform.count()操作的计数错误,python,pandas,Python,Pandas,我正在分组并依靠我的数据帧 这是我从.descripe()方法得到的结果: 而所有其他指标均为4。事实上,这个组中只有4个条形码,所以计数应该是5。怎么可能计数是5 invoice_number barcode OFF1540673 4054673005837 count 5.0 mean
invoice_number barcode
OFF1540673 4054673005837 count 5.0
mean 4.0
std 0.0
min 4.0
25% 4.0
50% 4.0
75% 4.0
max 4.0
4054673034394 count 5.0
mean 4.0
std 0.0
min 4.0
25% 4.0
50% 4.0
75% 4.0
max 4.0
4054673238488 count 5.0
mean 4.0
std 0.0
min 4.0
25% 4.0
50% 4.0
75% 4.0
max 4.0
4054673238822 count 5.0
mean 4.0
std 0.0
min 4.0
25% 4.0
50% 4.0
75% 4.0
max 4.0
更新
原始数据集
invoice_number barcode
327378 OFF1540673 4054673238488
327379 OFF1540673 4054673034394
327380 OFF1540673 4054673238822
327381 OFF1540673 4054673005837
327382 OFF1540673 4054673238488
327383 OFF1540673 4054673034394
327384 OFF1540673 4054673238822
327385 OFF1540673 4054673005837
327386 OFF1540673 4054673238488
327387 OFF1540673 4054673034394
327388 OFF1540673 4054673238822
327389 OFF1540673 4054673005837
327390 OFF1540673 4054673238488
327391 OFF1540673 4054673034394
327392 OFF1540673 4054673238822
327393 OFF1540673 4054673005837
327394 OFF1540673 4054673238488
327395 OFF1540673 4054673034394
327396 OFF1540673 4054673238822
327397 OFF1540673 4054673005837
两个列的数据类型均为“对象”
这是分组的命令
打印数据。分组方式(['invoice\u number','barcode'])['invoice\u number']。描述()
更新:无法使用提供的数据集再现您的问题:
In [16]: df.groupby(['invoice_number','barcode'])['invoice_number'].describe()
Out[16]:
invoice_number barcode
OFF1540673 4054673005837 count 5
unique 1
top OFF1540673
freq 5
4054673034394 count 5
unique 1
top OFF1540673
freq 5
4054673238488 count 5
unique 1
top OFF1540673
freq 5
4054673238822 count 5
unique 1
top OFF1540673
freq 5
Name: invoice_number, dtype: object
In [17]: df.groupby(['invoice_number','barcode'])['invoice_number'].count()
Out[17]:
invoice_number barcode
OFF1540673 4054673005837 5
4054673034394 5
4054673238488 5
4054673238822 5
Name: invoice_number, dtype: int64
该组包含4个条形码(见上文),但计数为5而不是4。这真的与南斯有关吗?@Jabb,谢谢你的数据集!我无法复制您的问题-请参阅更新在您的复制,计数是5,而有4个条形码。计数不是应该是4吗?啊!我在寻找不同的东西。。。我想知道每个订单的条形码的数量number@Jabb,您的意思是:
df.groupby('invoice_number')['barcode'].nunique()
?您的数据在描述之前看起来如何?我尝试模拟它df=pd.DataFrame({'a':[4]*10,'b':['a']*5+['b']*5})
-似乎所有的值都是4
,每个组的length
都是5
(如果NaN
s,则更多)print(df.groupby('b'['a'].description())
In [16]: df.groupby(['invoice_number','barcode'])['invoice_number'].describe()
Out[16]:
invoice_number barcode
OFF1540673 4054673005837 count 5
unique 1
top OFF1540673
freq 5
4054673034394 count 5
unique 1
top OFF1540673
freq 5
4054673238488 count 5
unique 1
top OFF1540673
freq 5
4054673238822 count 5
unique 1
top OFF1540673
freq 5
Name: invoice_number, dtype: object
In [17]: df.groupby(['invoice_number','barcode'])['invoice_number'].count()
Out[17]:
invoice_number barcode
OFF1540673 4054673005837 5
4054673034394 5
4054673238488 5
4054673238822 5
Name: invoice_number, dtype: int64