Pandas Groupby和Sum在单个列上并查找max_Pandas_Pandas Groupby

Pandas Groupby和Sum在单个列上并查找max

pandas

Pandas Groupby和Sum在单个列上并查找max,pandas,pandas-groupby,Pandas,Pandas Groupby,我正在制作一个df，如下所示： InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID 536365 85123A WHITE T-LIGHT 6 2010-12-01 08:26:00 2.55 17850.0 536365 71053

我正在制作一个df，如下所示：

InvoiceNo StockCode              Description             Quantity  InvoiceDate         UnitPrice  CustomerID
536365    85123A                       WHITE T-LIGHT          6   2010-12-01 08:26:00       2.55     17850.0
536365     71053                  WHITE METAL LANTERN         6   2010-12-01 08:26:00       3.39     17850.0
536365    84406B                          COAT HANGER         8   2010-12-01 08:26:00       4.73     17850.0
536368    84029G                     HOT WATER BOTTLE         6   2010-12-01 09:41:00       9.11     12391.0
...

我需要找到销售数量最多的股票代码。我尝试了以下代码：

clean_data.groupby(['StockCode']).sum().sort_values('Quantity', ascending=False)

但这也给了我其他列的总和，这是我不想要的。我还尝试使用

.idxmax（）

查找前面语句的最大值，但我认为答案不准确

我还需要找到每笔交易中出售的独特物品的数量。因此，每个唯一（InvoiceNo，CustomerID）对的行数是多少，并且不知道如何从这个开始。如能提供一些见解，将不胜感激

提前谢谢

IIUC：

code_with_max_quant = clean_data.groupby('StockCode')['Quantity'].sum().idxmax()
num_row_with_code = clean_data['StockCode'].eq(code_with_max_quant).sum()

# all rows with max code
clean_data[ clean_data['StockCode'].eq(code_with_max_quant)]

输出：

   InvoiceNo StockCode  Description  Quantity          InvoiceDate  UnitPrice  \
2     536365    84406B  COAT HANGER         8  2010-12-01 08:26:00       4.73   

   CustomerID  
2     17850.0

第一部分，你可以试试这个-

clean_data.groupby(['StockCode'])['Quantity'].sum().idxmax()

对于第二部分，请尝试以下内容-

clean_data.groupby(['InvoiceNo', 'CustomerID'])['StockCode'].nunique()

这将为您提供熊猫系列订购“库存代码”和从最大到最小的总数量

clean_data.groupby('StockCode')['Quantity'].sum()

第二部分：

clean_data.groupby(['InvoiceNo', 'CustomerID'])['StockCode'].unique()

将为您提供一个按“InvoiceNo”分组的系列，然后按“CustomerID”和每个交易的“StockCode”的唯一值进行分组

clean_data.groupby（'StockCode'）['Quantity'].sum（）.nlargest（1）

将同时提供id和值。非常好！我如何在图表中显示第2部分？我添加了

.reset\u index（）

，以便

发票号

可用，并尝试了命令

每个交易标绘项目（x='InvoiceNo'，y='StockCode'，kind='bar'）

，但它没有显示任何内容。在停止执行时，它会出现一系列错误。我认为它应该可以工作。刚试过，效果很好。您得到的错误是什么？它在图形之后打印所有输出。当我停止内核运行时，它会给我一个键盘中断错误。我让它运行了很长一段时间，但没有改变。如果不方便的话，你能列出你用来生成数据帧和图形的命令吗？