Python 重采样问题/误解

Python 重采样问题/误解,python,pandas,dataframe,Python,Pandas,Dataframe,我试图用熊猫来生成地震序列的历史组织 我的输入是一个CSV文件,如下所示: ID,DATE,LAT,LON,DEPTH,MAG,LOCALITY ISTerre2020odcbbh,2020-07-18T23:24:03.616341Z,45.426,6.32499,3.56121,1.56979,"MONTGELLAFREY" ISTerre2020nsbzaa,2020-07-12T23:32:31.159491Z,45.4239,6.32597,1.79717,0.81

我试图用熊猫来生成地震序列的历史组织

我的输入是一个CSV文件,如下所示:

ID,DATE,LAT,LON,DEPTH,MAG,LOCALITY
ISTerre2020odcbbh,2020-07-18T23:24:03.616341Z,45.426,6.32499,3.56121,1.56979,"MONTGELLAFREY"
ISTerre2020nsbzaa,2020-07-12T23:32:31.159491Z,45.4239,6.32597,1.79717,0.818867,"MONTGELLAFREY"
ISTerre2020lcxxda,2020-06-06T09:29:45.006351Z,45.4126,6.32702,3.7011,1.58432,"MONTGELLAFREY"
ISTerre2020jppugg,2020-05-15T23:30:27.553768Z,45.4288,6.29128,5.03303,1.0121,"LA CHAPELLE"
ISTerre2020flokvv,2020-03-18T02:46:01.877839Z,45.4134,6.38374,3.06686,1.08096,"SAINT-FRANCOIS-LONGCHAMP"
ISTerre2019znoncu,2019-12-28T11:44:51.242507Z,45.4341,6.33249,7.61996,1.26731,"EPIERRE"
我想在熊猫获得的数据框中插入目录中缺失的月份或天数(我指的是没有地震的天数/月份),以便在直方图中显示没有事件的月份的空条

我尝试用重采样('M')来实现这一点,但它不起作用,我得到以下错误:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
以下是我的脚本示例:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('catalogue.csv')
df.info()
df["DATE"] = df["DATE"].astype("datetime64")
(df["DATE"].groupby([df["DATE"].dt.year, df["DATE"].dt.month]).count()).plot(kind="bar") #to plot the hisotgram with the missing months

from datetime import datetime
from datetime import timedelta
from dateutil import rrule    
    
data1=df.sort_values('DATE').set_index('DATE')  
month_groups_resample = data1['DATE'].resample('M').count()
ax = month_groups_resample.plot(kind='bar',figsize=(10,5),legend=None)
我做了很多关于重采样的不同测试,但没有成功。 我相信有一个非常简单的方法可以做到这一点,但我对python不够流利

希望有人能帮助我

问候
Mickeel.

您可以先对日期进行四舍五入,创建一个不缺少日期的索引,然后使用此新索引重新索引整个数据帧

import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
from datetime import timedelta
from dateutil import rrule

df = pd.read_csv('catalogue.csv')
df.info()
df["DATE"] = df["DATE"].astype("datetime64")

data1 = df.sort_values('DATE').set_index('DATE')
data1.index = data1.index.round(freq='D')

# Index with all the dates
date_range = pd.date_range(
    start=data1.index[0], end=data1.index[-1],
    freq='D', closed='left')

# Fill the original dataframe. By default insert NaNs
data1 = data1.reindex(date_range)

# I'm using ID as representative of the number of events
df_num_events = data1.ID.groupby(level=0).count()
df_num_events.plot()

以下是一个仅适用于matplotlib的解决方案:

import datetime
from collections import Counter
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

with open("catalog.csv", "r") as _f:
    # skip first header line
    _f.readline()
    dates = [datetime.datetime.strptime(t.split(",")[1][:7], "%Y-%m") for t in _f.readlines()]
count_dict = Counter(dates)
x = count_dict.keys()
y = [count_dict[k] for k in keys]
    
fig, ax1 = plt.subplots(figsize=(9, 7))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%m\n%Y"))
ax1.xaxis.set_minor_formatter(mdates.DateFormatter("%m"))
ax1.yaxis.get_major_locator().set_params(integer=True)
ax1.set_xlabel("months of measurements")
ax1.set_ylabel("count of event")
fig.suptitle("MY MAIN TITLE")
rects = ax1.bar(x, y)
plt.show()

抱歉,Tojur,但我在您的解决方案中遇到以下错误: