Python 错误:<;类型';对象'&燃气轮机;在dataframe.resample().mean()上

Python 错误:<;类型';对象'&燃气轮机;在dataframe.resample().mean()上,python,python-2.7,pandas,Python,Python 2.7,Pandas,我必须修复一些遗留代码,这些代码会更改每日样本数据,如下所示: sample_data = [ { 'id': 10, 'name': 'example', 'tags': '["one", "two"]', # json encoded '2016-12-20': 2, '2016-12-21': 3, '2016-12-22': 10, '2016-12-23': 4,

我必须修复一些遗留代码,这些代码会更改每日样本数据,如下所示:

sample_data = [
    {
        'id': 10,
        'name': 'example',
        'tags': '["one", "two"]',  # json encoded
        '2016-12-20': 2,
        '2016-12-21': 3,
        '2016-12-22': 10,
        '2016-12-23': 4,
        '2016-12-24': 7,
        '2016-12-25': 5,
        '2016-12-26': 1,
        '2016-12-27': 6,
        '2016-12-28': 4,
        '2016-12-29': 3,
        '2016-12-30': 1,
    },
    {
        'id': 11,
        'name': None,
        'tags': '["one"]',  # json encoded
        '2016-12-20': 6,
        '2016-12-21': 10,
        '2016-12-22': 190,
        '2016-12-23': 77,
        '2016-12-24': 35,
        '2016-12-25': 346,
        '2016-12-26': 6,
        '2016-12-27': 9,
        '2016-12-28': 8,
        '2016-12-29': 3,
        '2016-12-30': 0,
    }
]
df = pd.DataFrame(data=sample_data)
df.set_index(['id', 'name', 'tags'], inplace=True)
df.columns = pd.to_datetime(df.columns)
df = df.replace(0, 1000)
df = df.T.resample('W')
df = df.mean()
df.index = df.index.strftime('%Y-%m-%d')
df = df.round()
df = df.fillna(method='ffill')
result = df.T.reset_index().to_dict(orient='records')
变成每周的方式。代码本身如下所示:

sample_data = [
    {
        'id': 10,
        'name': 'example',
        'tags': '["one", "two"]',  # json encoded
        '2016-12-20': 2,
        '2016-12-21': 3,
        '2016-12-22': 10,
        '2016-12-23': 4,
        '2016-12-24': 7,
        '2016-12-25': 5,
        '2016-12-26': 1,
        '2016-12-27': 6,
        '2016-12-28': 4,
        '2016-12-29': 3,
        '2016-12-30': 1,
    },
    {
        'id': 11,
        'name': None,
        'tags': '["one"]',  # json encoded
        '2016-12-20': 6,
        '2016-12-21': 10,
        '2016-12-22': 190,
        '2016-12-23': 77,
        '2016-12-24': 35,
        '2016-12-25': 346,
        '2016-12-26': 6,
        '2016-12-27': 9,
        '2016-12-28': 8,
        '2016-12-29': 3,
        '2016-12-30': 0,
    }
]
df = pd.DataFrame(data=sample_data)
df.set_index(['id', 'name', 'tags'], inplace=True)
df.columns = pd.to_datetime(df.columns)
df = df.replace(0, 1000)
df = df.T.resample('W')
df = df.mean()
df.index = df.index.strftime('%Y-%m-%d')
df = df.round()
df = df.fillna(method='ffill')
result = df.T.reset_index().to_dict(orient='records')
但是,我在执行过程中遇到了一个错误。代码正在处理大量数据(>10k行),错误似乎只是偶尔发生。回溯如下:

  File "[...]/api/helpers.py", line 277, in resample
    df = df.mean()
  File "[...]/lib/python2.7/site-packages/pandas/tseries/resample.py", line 540, in f
    return self._downsample(_method)
  File "[...]/lib/python2.7/site-packages/pandas/tseries/resample.py", line 693, in _downsample
    self.grouper, axis=self.axis).aggregate(how, **kwargs)
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3704, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3193, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "[...]/lib/python2.7/site-packages/pandas/core/base.py", line 432, in _aggregate
    return getattr(self, arg)(*args, **kwargs), None
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 1047, in median
    return self._python_agg_general(f)
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 818, in _python_agg_general
    for name, obj in self._iterate_slices():
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3123, in _iterate_slices
    yield val, slicer(val)
  File "[...]/lib/python2.7/site-packages/pandas/core/groupby.py", line 3115, in <lambda>
    slicer = lambda x: self.obj[x]
  File "[...]/lib/python2.7/site-packages/pandas/core/frame.py", line 2057, in __getitem__
    return self._getitem_multilevel(key)
  File "[...]/lib/python2.7/site-packages/pandas/core/frame.py", line 2101, in _getitem_multilevel
    loc = self.columns.get_loc(key)
  File "[...]/lib/python2.7/site-packages/pandas/indexes/multi.py", line 1686, in get_loc
    mask = self.labels[i][loc] == self.levels[i].get_loc(k)
  File "[...]/lib/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4145)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)
KeyError: <type 'object'>
文件“[…]/api/helpers.py”,第277行,重采样
df=df.mean()
文件“[…]/lib/python2.7/site packages/pandas/tseries/resample.py”,第540行,f
返回自下采样(_方法)
文件“[…]/lib/python2.7/site packages/pandas/tseries/resample.py”,第693行,在下采样中
self.gropper,axis=self.axis).聚合(方式,**kwargs)
文件“[…]/lib/python2.7/site packages/pandas/core/groupby.py”,第3704行,总计
返回super(DataFrameGroupBy,self).aggregate(arg,*args,**kwargs)
文件“[…]/lib/python2.7/site packages/pandas/core/groupby.py”,第3193行,总计
结果,how=self.\u聚合(arg,\u-level=\u-level,*args,**kwargs)
文件“[…]/lib/python2.7/site-packages/pandas/core/base.py”,第432行,汇总
返回getattr(self,arg)(*args,**kwargs),无
文件“[…]/lib/python2.7/site packages/pandas/core/groupby.py”,第1047行,中间位置
返回自我。_python_agg_general(f)
文件“[…]/lib/python2.7/site-packages/pandas/core/groupby.py”,第818行,在python-agg-general中
对于名称,对象在self._iterate_slices():
文件“[…]/lib/python2.7/site packages/pandas/core/groupby.py”,第3123行,在迭代切片中
产量val,切片机(val)
文件“[…]/lib/python2.7/site packages/pandas/core/groupby.py”,第3115行,在
切片器=lambda x:self.obj[x]
文件“[…]/lib/python2.7/site packages/pandas/core/frame.py”,第2057行,在__
返回自我。\u获取项目\u多级(键)
文件“[…]/lib/python2.7/site packages/pandas/core/frame.py”,第2101行,位于
loc=self.columns.get_loc(键)
文件“[…]/lib/python2.7/site packages/pandas/index/multi.py”,第1686行,在get_loc中
掩码=self.labels[i][loc]==self.levels[i]。获取位置(k)
文件“[…]/lib/python2.7/site packages/pandas/index/base.py”,第2136行,在get_loc中
返回self.\u引擎。获取\u loc(self.\u可能\u cast\u索引器(键))
pandas.index.IndexEngine.get_loc(pandas/index.c:4145)中的文件“pandas/index.pyx”,第132行
文件“pandas/index.pyx”,第154行,在pandas.index.IndexEngine.get_loc(pandas/index.c:4009)中
文件“pandas/src/hashtable_class_helper.pxi”,第732行,在pandas.hashtable.PyObjectHashTable.get_项(pandas/hashtable.c:13166)中
pandas.hashtable.PyObjectHashTable.get_项(pandas/hashtable.c:13120)中的第740行文件“pandas/src/hashtable_class_helper.pxi”
关键错误:

不管我做什么,我似乎都无法修复它,而且我对熊猫也不是很有经验。代码有什么我没有注意到的错误吗?谢谢你抽出时间。

你对熊猫的看法是什么?对我来说,它与您的示例数据完美地结合在一起。这里也是一样,看起来像是版本问题。熊猫0.19.2数量==1.11.3;熊猫==0.19.2。我要补充的是,所讨论的代码处理的数据量非常大,只有在提供了所述的非常大的输入数据时才会发生错误。您可以尝试使用a来找出导致键错误的
key
的值。您对pandas的看法是什么?对我来说,它与您的示例数据完美地结合在一起。这里也是一样,看起来像是版本问题。熊猫0.19.2数量==1.11.3;熊猫==0.19.2。我要补充的是,所讨论的代码处理的数据量非常大,只有在提供了所述的非常大的输入数据时才会发生错误。您可以尝试使用来找出导致键错误的
key
的值。