Python 如何为超过小时数的数据分配小时数?
考虑到以下几点:Python 如何为超过小时数的数据分配小时数?,python,pandas,Python,Pandas,考虑到以下几点: timeline = pd.date_range(start="2027-01-01", end="2061-01-01", freq="H") timeline = timeline[:-1] df1 = pd.DataFrame() for i in range(0, 34): df2 = pd.DataFrame() df2['value
timeline = pd.date_range(start="2027-01-01",
end="2061-01-01",
freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1['Row'] = df1.groupby(['year']).cumcount()
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline['Row'] = timeline.groupby(['year']).cumcount()
请注意,8900始终大于366*24。目标是将时间线和df1结合起来,以便使用前n行填充时间线。我们省略了当年的后续行,并继续到下一年
我遇到的问题是,并非所有年份的小时数都相同,因为有些年份是闰年,这相当麻烦。我想知道是否有一个有效的方法来处理这个问题
考虑到每年不同时间的复杂性,是否有执行合并的方法?代码
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
end="2061-01-01",
freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
date value year
0 2027-01-01 00:00:00 5 2027
1 2027-01-01 01:00:00 2 2027
2 2027-01-01 02:00:00 3 2027
3 2027-01-01 03:00:00 4 2027
4 2027-01-01 04:00:00 1 2027
... ... ... ...
298051 2060-12-31 19:00:00 1 2060
298052 2060-12-31 20:00:00 3 2060
298053 2060-12-31 21:00:00 2 2060
298054 2060-12-31 22:00:00 1 2060
298055 2060-12-31 23:00:00 3 2060
完整代码
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
end="2061-01-01",
freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
date value year
0 2027-01-01 00:00:00 5 2027
1 2027-01-01 01:00:00 2 2027
2 2027-01-01 02:00:00 3 2027
3 2027-01-01 03:00:00 4 2027
4 2027-01-01 04:00:00 1 2027
... ... ... ...
298051 2060-12-31 19:00:00 1 2060
298052 2060-12-31 20:00:00 3 2060
298053 2060-12-31 21:00:00 2 2060
298054 2060-12-31 22:00:00 1 2060
298055 2060-12-31 23:00:00 3 2060
编辑
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.merge_asof(df1, timeline, on='year', direction='nearest')
输出样本
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
end="2061-01-01",
freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
date value year
0 2027-01-01 00:00:00 5 2027
1 2027-01-01 01:00:00 2 2027
2 2027-01-01 02:00:00 3 2027
3 2027-01-01 03:00:00 4 2027
4 2027-01-01 04:00:00 1 2027
... ... ... ...
298051 2060-12-31 19:00:00 1 2060
298052 2060-12-31 20:00:00 3 2060
298053 2060-12-31 21:00:00 2 2060
298054 2060-12-31 22:00:00 1 2060
298055 2060-12-31 23:00:00 3 2060
我想到了一种稍微不同的方法,我们可以做以下几点:
timeline = pd.date_range(start="2027-01-01",
end="2061-01-01",
freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()
for i in range(0, 34):
df2 = pd.DataFrame()
df2['value'] = np.random.randint(1, 6, 8900)
df2['year'] = 2027 + i
df1 = pd.concat([df1, df2])
df1['Row'] = df1.groupby(['year']).cumcount()
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline['Row'] = timeline.groupby(['year']).cumcount()
然后在它们上面合并:
result = timeline.merge(df1, on=['year', 'Row'])
我相信这将强制执行行顺序。我有点担心键匹配的顺序。如果我们在某个非唯一键上进行连接,那么只有左表的第一行与第一行与第二行的匹配项合并。此外,匹配的第一行不会用于第一列的第二行。也许我弄错了,但是我在文档中找不到它,文档记录了这种行为。我已经编辑了代码,以便根据条件进行合并。请检查它是否有效。