如何用python计算上个月数据的最高总销售额_Python_Pandas_Numpy

如何用python计算上个月数据的最高总销售额

python pandas numpy

如何用python计算上个月数据的最高总销售额,python,pandas,numpy,Python,Pandas,Numpy,我有一个数据框名为store_data.csv，数据框中有数千个数据。样本数据如下所示- Date Store1 Store2 Store3 Store4 2018-06-01 2643 1642 2678 3050 2018-07-16 6442 5413 5784 7684 2018-07-24 4587 5743 3948 6124 2018-08-12 3547 8743 74

我有一个数据框名为store_data.csv，数据框中有数千个数据。样本数据如下所示-

Date       Store1   Store2   Store3   Store4
2018-06-01 2643     1642     2678     3050
2018-07-16 6442     5413     5784     7684
2018-07-24 4587     5743     3948     6124
2018-08-12 3547     8743     7462     8315

如何用python计算上个月哪个商店的总销售额最高？

首先创建

DatetimeIndex

：

#if necessary
#df = df.set_index('Date')
#df['Date'] = pd.to_datetime(df['Date'])

print (df)
            Store1  Store2  Store3  Store4
Date                                      
2018-06-01    2643    1642    2678    3050
2018-07-16    6442    5413    5784    7684
2018-08-10    4587    5743    3948    6124 <-change date for better sample
2018-08-12    3547    8743    7462    8315

print (df.index)
DatetimeIndex(['2018-06-01', '2018-07-16', '2018-08-10', '2018-08-12'], 
              dtype='datetime64[ns]', name='Date', freq=None)

按最后一个值过滤，

sum

和按最大值最后获取列名的方法：

谢谢你，@Jon Clements提供了另一个解决方案：

out = df.last('M').resample('M').sum().T.idxmax()
#if need scalar output
out = df.last('M').resample('M').sum().iloc[0].idxmax()

这个解决方案是针对您的问题的，有点粗糙，但我已经测试过了，它似乎对我有用

该计划将找到上个月销售额最高的商店。该程序假设月份是按顺序给出的（数据不混合）。如果这是个问题，请将问题修改得更具体一点，我会看看我能做些什么。一种可能的实现方法是使用

字典跟踪每个月，然后访问上个月的数据以找到最大值
import re

def get_highest_sales(filename):
    sales_during_month = [0, 0, 0, 0]
    with open(filename) as f:
        f.readline() # Skip first line
        prev_month = ""
        for line in f:
            cleaned = re.sub(" +", ' ', line)
            values = cleaned.split(' ')
            month  = values[0].split('-')[1]
            if not month == prev_month:
                prev_month = month
                sales_during_month = [0, 0, 0, 0]
            sales = [float(sale) for sale in values[1:]]
            for store,sale in enumerate(sales):
                sales_during_month[store] += sale

    return "Store: " + str(sales_during_month.index(max(sales_during_month)) + 1)

到目前为止您尝试了什么？您的预期输出是什么？我可以知道为什么要给出-1
？最后一个索引？@pyd-是否有datetimeindex？可能需要df=df.set_index（'Date'）
@pyd-我现在知道，我只更改数据。我不确定它有多准确，或者是否更好。。。但是从您的示例df
开始，它有一个DateTimeIndex
，您可以这样做：df.last（'M'）。重采样（'M'）.sum（）.T.idxmax（）
，然后使用它。。。可能允许更容易的时间选择，这样你就可以每月做“最后30天”或“每周做最后3个月”或类似的事情，而不是使用索引偏移等。。。虽然没有完全测试它，但认为它值得一提……因此：df.last（'M'）。resample（'D'）。sum（）。T.idxmax（）将为您提供上个月日均表现最佳的商店，例如。。。（或类似情况，但可能需要满足重新采样中填充/处理填充值的要求…）
out = df.last('M').resample('M').sum().T.idxmax()
#if need scalar output
out = df.last('M').resample('M').sum().iloc[0].idxmax()

import re

def get_highest_sales(filename):
    sales_during_month = [0, 0, 0, 0]
    with open(filename) as f:
        f.readline() # Skip first line
        prev_month = ""
        for line in f:
            cleaned = re.sub(" +", ' ', line)
            values = cleaned.split(' ')
            month  = values[0].split('-')[1]
            if not month == prev_month:
                prev_month = month
                sales_during_month = [0, 0, 0, 0]
            sales = [float(sale) for sale in values[1:]]
            for store,sale in enumerate(sales):
                sales_during_month[store] += sale

    return "Store: " + str(sales_during_month.index(max(sales_during_month)) + 1)