Python 将每日数据分组为个月，并按用户计数对象_Python_Pandas_Pandas Groupby

Python 将每日数据分组为个月，并按用户计数对象

python pandas

Python 将每日数据分组为个月，并按用户计数对象,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我正在尝试按月份和用户对产品计数进行分组。我每天都有数据，所以我先在几个月内对它进行分组，然后按用户分组。见下表： Date UserID Product 2016-02-02 1 Chocolate 2016-03-03 22 Chocolate 2016-03-03 22 Banana 2016-03-03 22 Banana 2016-03-03 22 Chocolate 2016-04-03 22 Chocolate 2016-04-03 22

我正在尝试按月份和用户对产品计数进行分组。我每天都有数据，所以我先在几个月内对它进行分组，然后按用户分组。见下表：

Date         UserID Product
2016-02-02  1   Chocolate
2016-03-03  22  Chocolate
2016-03-03  22  Banana
2016-03-03  22  Banana
2016-03-03  22  Chocolate
2016-04-03  22  Chocolate
2016-04-03  22  Banana
2016-04-03  33  Banana
2016-04-03  33  Chocolate
2016-04-03  22  Peanuts
2016-04-03  33  Peanuts
2016-04-03  33  Peanuts

我的结果应该是：

Date     UserID   Product     Count
2016-03  22       Banana      2
2016-03  22       Chocolate   2
2016-04  22       Banana      1
2016-04  22       Peanuts     1
2016-04  33       Banana      1
2016-04  33       Peanuts     2
2016-4   33       Chocolate   1

我需要用蟒蛇熊猫来做这件事，但我不能

使用此代码

dfcount = df(['Date','UserID','Product']).Kit.count()

我每天都会数数，但我怎么能每月数数呢

我试过这个：

df[['Date', 'UserID', 'Product']].groupby(pd.Grouper(key='Date', freq='1M')).sum().sort_values(by='Date', ascending=True)['Product']

它不起作用

它返回它无法识别我的产品列，但我的分组可能是错误的

KeyError:“产品”

如果

Date

是字符串，则可以

df.groupby([df.Date.str[:7], 'UserID', 'Product']).count()

                          Date
Date    UserID Product        
2016-02 1      Chocolate     1
2016-03 22     Banana        2
               Chocolate     2
2016-04 22     Banana        1
               Chocolate     1
               Peanuts       1
        33     Banana        1
               Chocolate     1
               Peanuts       2

使用datetime列：

df.groupby([df.Date.dt.to_period('M'), 'UserID', 'Product']).count()

如果

Date

是字符串，则可以

df.groupby([df.Date.str[:7], 'UserID', 'Product']).count()

                          Date
Date    UserID Product        
2016-02 1      Chocolate     1
2016-03 22     Banana        2
               Chocolate     2
2016-04 22     Banana        1
               Chocolate     1
               Peanuts       1
        33     Banana        1
               Chocolate     1
               Peanuts       2

使用datetime列：

df.groupby([df.Date.dt.to_period('M'), 'UserID', 'Product']).count()

输出：

+---+---------+--------+-----------+-------+
|   |  Date   | UserID |  Product  | Count |
+---+---------+--------+-----------+-------+
| 0 | 2016-02 |      1 | Chocolate |     1 |
| 1 | 2016-03 |     22 | Banana    |     2 |
| 2 | 2016-03 |     22 | Chocolate |     2 |
| 3 | 2016-04 |     22 | Banana    |     1 |
| 4 | 2016-04 |     22 | Chocolate |     1 |
| 5 | 2016-04 |     22 | Peanuts   |     1 |
| 6 | 2016-04 |     33 | Banana    |     1 |
| 7 | 2016-04 |     33 | Chocolate |     1 |
| 8 | 2016-04 |     33 | Peanuts   |     2 |
+---+---------+--------+-----------+-------+

输出：

+---+---------+--------+-----------+-------+
|   |  Date   | UserID |  Product  | Count |
+---+---------+--------+-----------+-------+
| 0 | 2016-02 |      1 | Chocolate |     1 |
| 1 | 2016-03 |     22 | Banana    |     2 |
| 2 | 2016-03 |     22 | Chocolate |     2 |
| 3 | 2016-04 |     22 | Banana    |     1 |
| 4 | 2016-04 |     22 | Chocolate |     1 |
| 5 | 2016-04 |     22 | Peanuts   |     1 |
| 6 | 2016-04 |     33 | Banana    |     1 |
| 7 | 2016-04 |     33 | Chocolate |     1 |
| 8 | 2016-04 |     33 | Peanuts   |     2 |
+---+---------+--------+-----------+-------+

我首先将该列转换为Datetime，因为这样可以很容易地提取年/月/日（通过执行

df..dt.

）

然后，按月份、客户和产品分组：

counts = (df.groupby([df.Date.dt.month, 
                      'UserID', 
                      'Product']).count())
print(counts)

                       Date
Date UserID Product        
2    1      Chocolate     1
3    22     Banana        2
            Chocolate     2
4    22     Banana        1
            Chocolate     1
            Peanuts       1
     33     Banana        1
            Chocolate     1
            Peanuts       2

在这里，如果您获得的数据超过一年，上述解决方案允许您仍然按月分组。相反，如果您希望在这个新扩展的数据集中按年度和月份对产品和用户进行分组，只需将年份提取添加到groupby中，如下所示：

counts = (df.groupby([df.Date.dt.year, 
                      df.Date.dt.month, 
                      'UserID', 
                      'Product']).count())

print(counts)

                            Date
Date Date UserID Product        
2016 2    1      Chocolate     1
     3    22     Banana        2
                 Chocolate     2
     4    22     Banana        1
                 Chocolate     1
                 Peanuts       1
          33     Banana        1
                 Chocolate     1
                 Peanuts       2
2017 2    1      Chocolate     1
     3    22     Banana        2
                 Chocolate     1

通过这种方式，您可以更明确地说明如何对数据进行分组（因此以后不太可能出现意外结果）

我首先将该列转换为Datetime，因为这样可以很容易地提取年/月/日（通过执行

df..dt.

）

然后，按月份、客户和产品分组：

counts = (df.groupby([df.Date.dt.month, 
                      'UserID', 
                      'Product']).count())
print(counts)

                       Date
Date UserID Product        
2    1      Chocolate     1
3    22     Banana        2
            Chocolate     2
4    22     Banana        1
            Chocolate     1
            Peanuts       1
     33     Banana        1
            Chocolate     1
            Peanuts       2

counts = (df.groupby([df.Date.dt.year, 
                      df.Date.dt.month, 
                      'UserID', 
                      'Product']).count())

print(counts)

                            Date
Date Date UserID Product        
2016 2    1      Chocolate     1
     3    22     Banana        2
                 Chocolate     2
     4    22     Banana        1
                 Chocolate     1
                 Peanuts       1
          33     Banana        1
                 Chocolate     1
                 Peanuts       2
2017 2    1      Chocolate     1
     3    22     Banana        2
                 Chocolate     1

通过这种方式，您可以更明确地说明如何对数据进行分组（因此以后不太可能出现意外结果）

尝试：

df.groupby（['Date'，'UserID'，'Product']）.count（），reset_index（）

Try:

df.groupby（['Date'，'UserID'，'Product']）.count（），reset_index（）

这两种解决方案都适用于给定的示例，但应注意，这实际上是按年份和月份分组的，而不仅仅是按月份分组。因此，如果您有多年的数据，并且只想按月分组，那么您应该对字符串执行

df.Date.str[5:7]

，或者对datetime列执行

df.groupby（[df.Date.dt.month，'UserID'，'Product']）。count（）

，这两种解决方案都适用于给定的示例，但应该注意的是，这实际上是按年份和月份分组的，而不仅仅是月份。因此，如果您有多年的数据，并且只想按月分组，那么您应该对字符串执行

df.Date.str[5:7]

，或者对df.groupby（[df.Date.dt.month，'UserID'，'Product']）.count（）执行

df.groupby（…）.agg（{'count'：'sum}）中添加到每行的datetime列执行以计算后面的'count列
df.groupby（…）.size（）。
可能会产生相同的结果，但您必须将添加到每行的最后一列重命名为“Count”，以便稍后在df.groupby（…）.agg（{'Count'：'sum'））中计算'Count'列。
df.groupby（…）.size（）。
可能会导致相同的结果，但您必须将最后一列重命名为“Count”