Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫的分组多索引_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫的分组多索引

Python 熊猫的分组多索引,python,python-3.x,pandas,Python,Python 3.x,Pandas,我在spark中选择了一些数据,如下所示: base = spark.sql(""" SELECT ... ... """) print(base.count()) base.cache() base=base.toPandas() base['yyyy_mm_dd'] = pd.to_datetime(base['yyyy_mm_dd']) base.set_index("yyyy_mm_dd", inplace=True)

我在spark中选择了一些数据,如下所示:

base = spark.sql("""
    SELECT
        ...
        ...
""")
print(base.count())
base.cache()
base=base.toPandas()
base['yyyy_mm_dd'] = pd.to_datetime(base['yyyy_mm_dd'])
base.set_index("yyyy_mm_dd", inplace=True)
              id    aggregated_field    aggregated_field2
yyyy_mm_dd
                  aggregated_field    aggregated_field2
yyyy_mm_dd  id
这给了我一个如下所示的数据帧:

base = spark.sql("""
    SELECT
        ...
        ...
""")
print(base.count())
base.cache()
base=base.toPandas()
base['yyyy_mm_dd'] = pd.to_datetime(base['yyyy_mm_dd'])
base.set_index("yyyy_mm_dd", inplace=True)
              id    aggregated_field    aggregated_field2
yyyy_mm_dd
                  aggregated_field    aggregated_field2
yyyy_mm_dd  id
我想按
yyyy\u mm\u dd
id
进行分组,但对聚合字段求和。这样我每天可以看到每个提供者的聚合字段的总和。然后,我想将其汇总为每月一次。这就是我所做的:

agg = base.groupby(['yyyy_mm_dd', 'id'])[['aggregated_field','aggregated_field2']].sum()
我的数据帧现在看起来如下所示:

base = spark.sql("""
    SELECT
        ...
        ...
""")
print(base.count())
base.cache()
base=base.toPandas()
base['yyyy_mm_dd'] = pd.to_datetime(base['yyyy_mm_dd'])
base.set_index("yyyy_mm_dd", inplace=True)
              id    aggregated_field    aggregated_field2
yyyy_mm_dd
                  aggregated_field    aggregated_field2
yyyy_mm_dd  id
最后,我尝试
重新采样()
每月:

agg = agg.resample('M').sum()
然后我得到这个错误:

TypeError:仅对DatetimeIndex、TimedeltaIndex或PeriodIndex有效,但获得了“MultiIndex”的实例

我不知道为什么,因为我之前将yyyy\u mm\u dd转换为日期索引

编辑:我想要的输出是:

yyyy_mm_dd    id   aggregated_metric    aggregated_metric2
2019-01-01    1    ...                  ...
              2
              3
2019-01-02    1
              2
              3

也许你会发现这很有用:

解决方案1(采用pd.Period及其“正确”的月度数据格式显示)

解决方案2(坚持使用datetime64)


agg.groupby('id')。重采样('M').sum()
?或
base.groupby('id')。重采样('M').sum()
?@QuangHoang似乎都不起作用。您的第一条注释给出了相同的错误,而id字段中的第二条结果也被求和。