Python PANDAS ValueError:无法从重复的轴重新索引-尝试用零回填缺少的日期

Python PANDAS ValueError:无法从重复的轴重新索引-尝试用零回填缺少的日期,python,pandas,anaconda,Python,Pandas,Anaconda,我有一个导入的csv文件。我已经分析了日期,并将索引设置为已分析的日期字段 importColumnFields = ['Startdate','Start Time','Data'] pd1 = pd.read_csv(readpath + "/TESTCSV1.csv", index_col=None, usecols=importColumnFields, parse_dates=[['Startdate','Start Time']]).set_index('Startdate_Sta

我有一个导入的csv文件。我已经分析了日期,并将索引设置为已分析的日期字段

importColumnFields = ['Startdate','Start Time','Data']

pd1 = pd.read_csv(readpath + "/TESTCSV1.csv", index_col=None, usecols=importColumnFields, parse_dates=[['Startdate','Start Time']]).set_index('Startdate_Start Time')

pd1         
        Startdate_Start Time, Data          
        2019-01-01 00:00:00,  2.971
        2019-01-01 01:00:00,  2.362
        2019-01-01 02:00:00,  2.241
        2019-01-01 03:00:00,  2.763
        2019-01-01 04:00:00,  2.590
        ... ... ... ... ...
        2019-09-16 06:00:00,  2.620
        2019-09-16 07:00:00,  2.644
        2019-09-16 08:00:00,  2.684
        2019-09-16 09:00:00,  2.968
        2019-09-16 10:00:00,  2.720
我需要从2019年1月1日到2019年12月31日重新编制索引。如果数据不存在(即,丢失的数据间隔或将来的日期),我希望将其填充为0

newDateIndex = pd.date_range(start='1/1/2019 00:00:00',end='12/31/2019 23:00:00', freq='H')

newDateIndex

DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:00:00',
               '2019-01-01 02:00:00', '2019-01-01 03:00:00',
               '2019-01-01 04:00:00', '2019-01-01 05:00:00',
               '2019-01-01 06:00:00', '2019-01-01 07:00:00',
               '2019-01-01 08:00:00', '2019-01-01 09:00:00',
               ...
               '2019-12-31 14:00:00', '2019-12-31 15:00:00',
               '2019-12-31 16:00:00', '2019-12-31 17:00:00',
               '2019-12-31 18:00:00', '2019-12-31 19:00:00',
               '2019-12-31 20:00:00', '2019-12-31 21:00:00',
               '2019-12-31 22:00:00', '2019-12-31 23:00:00'],
              dtype='datetime64[ns]', length=8760, freq='H')

reindexed_pd1 = pd1.reindex(newDateIndex,fill_value=0)
当我使用pd.reindex()时,我得到一个ValueError:无法从重复轴重新编制索引

我很困惑,因为当我查阅Pandas文档()时,它显示了类似的内容:

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

我不知道我的“复制轴”是什么?

如果您通过添加重复日期来修改熊猫文档示例,您将得到相同的错误

date_index = pd.date_range('1/1/2010', periods=3, freq='D').append(pd.date_range('1/1/2010', periods=3, freq='D'))

df = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, index=date_index)

            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-01   100.0
2010-01-02    89.0
2010-01-03    88.0

date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
df.reindex(date_index2)


ValueError: cannot reindex from a duplicate axis

Startdate\u Start Time
中有重复项。在重新编制索引之前,您需要删除或聚合它们。谢谢。我确实在源数据文件中发现了意外的重复时间戳。