Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 找到一年内排名前n的客户,然后将这些客户';一年中每个月的销售量_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python 找到一年内排名前n的客户,然后将这些客户';一年中每个月的销售量

Python 找到一年内排名前n的客户,然后将这些客户';一年中每个月的销售量,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,早上好 我想报告今年前n名客户的情况,然后展示这些前n名客户全年的表现。样本df: import pandas as pd dfTest = [ ('Client', ['A','A','A','A', 'B','B','B','B', 'C','C','C','C', 'D','D','D','D']),

早上好

我想报告今年前n名客户的情况,然后展示这些前n名客户全年的表现。样本df:

import pandas as pd
dfTest = [
             ('Client', ['A','A','A','A',
                         'B','B','B','B',
                         'C','C','C','C',
                         'D','D','D','D']),
            ('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
                             '2018-08', '2018-09', '2018-10','2018-11',
                             '2018-08', '2018-09', '2018-10', '2018-11',
                             '2018-08', '2018-09', '2018-10', '2018-11']),
            ('Volume', [100, 200, 300,400,
                        1, 2, 3,4,
                        10, 20, 30,40,
                        1000, 2000, 3000,4000]
            ),
            ('state', ['Done', 'Tied Done', 'Tied Done','Done',
                       'Passed', 'Done', 'Passed', 'Done',
                       'Rejected', 'Done', 'Passed', 'Done',
                       'Done', 'Done', 'Done', 'Done']
            )
          ]
df = pd.DataFrame.from_items(dfTest)
print(df)

   Client Year_Month  Volume      state
0       A    2018-08     100       Done
1       A    2018-09     200  Tied Done
2       A    2018-10     300  Tied Done
3       A    2018-11     400       Done
4       B    2018-08       1     Passed
5       B    2018-09       2       Done
6       B    2018-10       3     Passed
7       B    2018-11       4       Done
8       C    2018-08      10   Rejected
9       C    2018-09      20       Done
10      C    2018-10      30     Passed
11      C    2018-11      40       Done
12      D    2018-08    1000       Done
13      D    2018-09    2000       Done
14      D    2018-10    3000       Done
15      D    2018-11    4000       Done
现在确定顶部,例如2(n);已完成交易的客户:

d = [
    ('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)

Client             
A              1000
B                 6
C                60
D             10000

print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))

        Done_Volume
Client             
D             10000
A              1000
因此,客户A和D是我的前两位(n)执行者。 现在,我想将这个列表或df反馈到原始数据中,以检索它们在一年中的性能,其中年/月在顶部上升,客户机列为行

Client  2018-08 2018-09 2018-10 2018-11
A       100     200     300     400
D       1000    2000    3000    4000
IIUC

IIUC

你需要方法

以下是我的建议:

def get_top_n_performer(df, n):
    df_done = df[df['state'].isin(['Done', 'Tied Done'])]
    aggs= {'Volume':['sum']}
    data = df_done.groupby('Client').agg(aggs)
    data = data.reset_index()
    data.columns = ['Client','Volume_sum']
    data = data.sort_values(by='Volume_sum', ascending=False) 
    return data.head(n)

ls= list(get_top_n_performer(df, 2).Client.values)

data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
               columns=['Year_Month'])
data = data.reset_index()

print(data)
输出:

Year_Month Client  2018-08  2018-09  2018-10  2018-11
0               A      100      200      300      400
1               D     1000     2000     3000     4000
我希望这有帮助

你需要一种方法

以下是我的建议:

def get_top_n_performer(df, n):
    df_done = df[df['state'].isin(['Done', 'Tied Done'])]
    aggs= {'Volume':['sum']}
    data = df_done.groupby('Client').agg(aggs)
    data = data.reset_index()
    data.columns = ['Client','Volume_sum']
    data = data.sort_values(by='Volume_sum', ascending=False) 
    return data.head(n)

ls= list(get_top_n_performer(df, 2).Client.values)

data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
               columns=['Year_Month'])
data = data.reset_index()

print(data)
输出:

Year_Month Client  2018-08  2018-09  2018-10  2018-11
0               A      100      200      300      400
1               D     1000     2000     3000     4000

我希望这有帮助

非常感谢@Wen Ben。“s.sum(1).nlagest(2).指数在一整年中都是总和?”文本:你能帮我回答以下问题吗?当然,午饭后让我试试。@panda让我看看that@Wen-本谢谢你非常感谢@Wen Ben。“s.sum(1).nlagest(2).指数在一整年中都是总和?”文本:你能帮我回答以下问题吗?当然,午饭后让我试试。@panda让我看看that@Wen-Ben Thank youThanks@CHAMI Soufiane,这在我的大数据集上返回了正确的结果。我很高兴这有帮助!谢谢@CHAMI Soufiane,这在我的大数据集上返回了正确的结果。我很高兴这有帮助!