Python 创建用于解除数据帧堆栈的函数_Python_Pandas

Python 创建用于解除数据帧堆栈的函数

python pandas

Python 创建用于解除数据帧堆栈的函数,python,pandas,Python,Pandas,目前我有一个数据框架，它的结构有点像这样 InvoiceNo Month Year Size 1 1 2014 7 2 1 2014 8 3 2 2014 11 4 3 2015 9 5 7 2015 8.5 等等我正在尝试创建一个函数，该函数将按年份对数据帧进行分段，并按大小和月份进行分组，然后计算发票编号，最后取消该数据帧的

目前我有一个数据框架，它的结构有点像这样

InvoiceNo  Month  Year  Size
     1       1    2014   7
     2       1    2014   8
     3       2    2014   11
     4       3    2015   9
     5       7    2015   8.5

等等

我正在尝试创建一个函数，该函数将按年份对数据帧进行分段，并按大小和月份进行分组，然后计算发票编号，最后取消该数据帧的堆栈

我一直在做的事情是这样的：

x = 2014

def Year_calc(df):
    return df[df['Year'] == x].groupby(['Size','Month']).agg({'InvoiceNo': 'count'}).unstack(0).columns.droplevel(0).fillna(0)

然后df2014=年度计算（df）

但它返回以下输出：

Float64Index([], dtype='float64', name='Size')

有人能指出我做错了什么吗？

df.apply

将行或列作为序列对象传递，具体取决于指定的轴。它不会传递整个数据帧

如果您想将函数应用于整个数据帧，那么

df2014=Year\u calc（df）

如何

您还应该考虑将年度作为函数的一个参数，因此明确了AythyCalc函数所做的事情。

<代码> DF。应用< /COD>要么将行或列作为一个系列对象——取决于所指定的轴。它不会传递整个数据帧

如果您想将函数应用于整个数据帧，那么

df2014=Year\u calc（df）

如何

您还应该考虑将年度作为函数的一个参数，这样就清楚了AythyCalc函数在做什么。

使用<代码> GROPPB/<代码>，<代码>计数< /C> >和<代码> unStuts >：

res = df.groupby(['Year', 'Size', 'Month',]).InvoiceNo.count().unstack(0, fill_value=0)
res

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

或者，与透视表相当：

res = df.pivot_table(index=['Size', 'Month'], 
                     columns='Year', 
                     values='InvoiceNo', 
                     aggfunc='count', 
                     fill_value=0)

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

比较如下：

res[2014] > res[2015]

或者，只需为所需年份计算：

(df[df.Year.eq(2014)]
     .groupby(['Size', 'Month'])
     .InvoiceNo
     .count()
     .unstack(1, fill_value=0))

Month  1  2
Size       
7.0    1  0
8.0    1  0
11.0   0  1

使用

groupby

、

count

和

unstack

：

res = df.groupby(['Year', 'Size', 'Month',]).InvoiceNo.count().unstack(0, fill_value=0)
res

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

或者，与透视表相当：

res = df.pivot_table(index=['Size', 'Month'], 
                     columns='Year', 
                     values='InvoiceNo', 
                     aggfunc='count', 
                     fill_value=0)

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

比较如下：

res[2014] > res[2015]

或者，只需为所需年份计算：

(df[df.Year.eq(2014)]
     .groupby(['Size', 'Month'])
     .InvoiceNo
     .count()
     .unstack(1, fill_value=0))

Month  1  2
Size       
7.0    1  0
8.0    1  0
11.0   0  1

以下是输入数据：

import pandas as pd

d = {'InvoiceNo':[1,2,3,4,5],'Month':[1,1,2,3,7],'Year':[2014,2014,2014,2015,2015],'Size':[7,8,11,9,8.5]}
df = pd.DataFrame(data = d)

解决方案1:

使用前面的答案和您给出的元素，下面是我设法编写的函数：

def Year_calc(data, year):

# grouping the by Size and month
t1 = data.loc[data.Year == year].groupby(['Size','Month'])

#count the number of Invoice for the given year
t2 = t1.InvoiceNo.count().unstack(0, fill_value=0)
return t2

以下是2014年返回的表格：

Size   7.0   8.0   11.0
Month                  
1         1     1     0
2         0     0     1

解决方案2 由于您删除了年作为参数，因此似乎最好进行一些调整，您可以在执行“分组依据”之前按年选择行，或者按年、月、大小分组，然后选择与所需年份对应的行

def Year_calc(data):

    # grouping the by Year, Size and month
    t1 = data.groupby(['Year','Month','Size'])

    #count the number of Invoice for the given year
    t2 = t1.InvoiceNo.count().unstack(2, fill_value=0)
    return t2

未过滤的输出将是：

Size    7.0     8.0     8.5     9.0     11.0
Year    Month                   
2014    1   1   1   0   0   0
        2   0   0   0   0   1
2015    3   0   0   0   1   0
        7   0   0   1   0   0

假设您需要2015年的数据，然后键入：

tdf = Year_calc(data = df)
tdf.xs(2015) 
# or
test.loc[(2015,),:]

返回的结果是：

Size    7.0     8.0     8.5     9.0     11.0
Month                   
    3    0       0       0       1       0
    7    0       0       1       0       0

请检查本文中的多索引切片：

希望这是有帮助的

以下是输入数据：

import pandas as pd

d = {'InvoiceNo':[1,2,3,4,5],'Month':[1,1,2,3,7],'Year':[2014,2014,2014,2015,2015],'Size':[7,8,11,9,8.5]}
df = pd.DataFrame(data = d)

解决方案1:

使用前面的答案和您给出的元素，下面是我设法编写的函数：

def Year_calc(data, year):

# grouping the by Size and month
t1 = data.loc[data.Year == year].groupby(['Size','Month'])

#count the number of Invoice for the given year
t2 = t1.InvoiceNo.count().unstack(0, fill_value=0)
return t2

以下是2014年返回的表格：

Size   7.0   8.0   11.0
Month                  
1         1     1     0
2         0     0     1

def Year_calc(data):

    # grouping the by Year, Size and month
    t1 = data.groupby(['Year','Month','Size'])

    #count the number of Invoice for the given year
    t2 = t1.InvoiceNo.count().unstack(2, fill_value=0)
    return t2

未过滤的输出将是：

Size    7.0     8.0     8.5     9.0     11.0
Year    Month                   
2014    1   1   1   0   0   0
        2   0   0   0   0   1
2015    3   0   0   0   1   0
        7   0   0   1   0   0

假设您需要2015年的数据，然后键入：

tdf = Year_calc(data = df)
tdf.xs(2015) 
# or
test.loc[(2015,),:]

返回的结果是：

Size    7.0     8.0     8.5     9.0     11.0
Month                   
    3    0       0       0       1       0
    7    0       0       1       0       0

请检查本文中的多索引切片：

希望这是有帮助的

在您的df中，它是InvoiceNO而不是InvoiceNoApologies，但这是一个打字错误。在我的实际笔记本中，两者都被指定为“InvoiceNo”。在您的df中，它是InvoiceNo而不是InvoiceNoApologies，但这是一个打字错误。在我的实际笔记本中，两者都指定为“发票号”。谢谢Erik。我不再得到错误，但现在我得到一个数组，看起来像：Float64Index（[]，dtype='float64'，name='Size'）Agree。不必为上述情况创建函数，谢谢Erik。我不再得到错误，但现在我得到一个数组，看起来像：Float64Index（[]，dtype='float64'，name='Size'）Agree。不必为上述情况创建函数，但我正在尝试为每年创建一个单独的数据框进行比较。@KyleMcComb我为您提供了一些选项。真的，将您想要的列切分出来并进行比较应该不会太难。@Kylemcomb Do

df.Year.unique（）

，您会有一个所有年份的列表。这很管用，但我正在尝试为每年创建一个单独的数据框进行比较。@Kylemcomb我给了您一些选项。真的，切掉想要比较的列应该不会太难。@KyleMcComb Do

df.Year.unique（）

，您将有一个所有年份的列表。我不得不从参数中删除年份，但在这样做之后，它返回了一个带有两个垂直维度的平面框架：月和大小，框架中没有数据。我不得不从参数中删除年份，但在删除之后，它返回了一个带有两个垂直维度的平面框架：月份和大小，框架中没有数据。