Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何为超过小时数的数据分配小时数?_Python_Pandas - Fatal编程技术网

Python 如何为超过小时数的数据分配小时数?

Python 如何为超过小时数的数据分配小时数?,python,pandas,Python,Pandas,考虑到以下几点: timeline = pd.date_range(start="2027-01-01", end="2061-01-01", freq="H") timeline = timeline[:-1] df1 = pd.DataFrame() for i in range(0, 34): df2 = pd.DataFrame() df2['value

考虑到以下几点:

timeline = pd.date_range(start="2027-01-01",
              end="2061-01-01",
              freq="H")
timeline = timeline[:-1]

df1 = pd.DataFrame()

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1['Row'] = df1.groupby(['year']).cumcount()
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline['Row'] = timeline.groupby(['year']).cumcount()
请注意,8900始终大于366*24。目标是将时间线和df1结合起来,以便使用前n行填充时间线。我们省略了当年的后续行,并继续到下一年

我遇到的问题是,并非所有年份的小时数都相同,因为有些年份是闰年,这相当麻烦。我想知道是否有一个有效的方法来处理这个问题

考虑到每年不同时间的复杂性,是否有执行合并的方法?

代码

df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
              end="2061-01-01",
              freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)

timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)

pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
    date                value   year
0   2027-01-01 00:00:00 5       2027
1   2027-01-01 01:00:00 2       2027
2   2027-01-01 02:00:00 3       2027
3   2027-01-01 03:00:00 4       2027
4   2027-01-01 04:00:00 1       2027
... ... ... ...
298051  2060-12-31 19:00:00 1   2060
298052  2060-12-31 20:00:00 3   2060
298053  2060-12-31 21:00:00 2   2060
298054  2060-12-31 22:00:00 1   2060
298055  2060-12-31 23:00:00 3   2060
完整代码

df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
              end="2061-01-01",
              freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)

timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)

pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
    date                value   year
0   2027-01-01 00:00:00 5       2027
1   2027-01-01 01:00:00 2       2027
2   2027-01-01 02:00:00 3       2027
3   2027-01-01 03:00:00 4       2027
4   2027-01-01 04:00:00 1       2027
... ... ... ...
298051  2060-12-31 19:00:00 1   2060
298052  2060-12-31 20:00:00 3   2060
298053  2060-12-31 21:00:00 2   2060
298054  2060-12-31 22:00:00 1   2060
298055  2060-12-31 23:00:00 3   2060
编辑

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)

timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)

pd.merge_asof(df1, timeline, on='year', direction='nearest')
输出样本

df1 = df1.reset_index(drop=True)
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)
pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
timeline = pd.date_range(start="2027-01-01",
              end="2061-01-01",
              freq="H")
timeline = timeline[:-1]
df1 = pd.DataFrame()

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1 = df1.reset_index(drop=True)

timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['tlyear'] = timeline.date.dt.year
timeline = timeline.reset_index(drop=True)

pd.concat([timeline,df1], join='inner', axis=1).drop('tlyear',1)
    date                value   year
0   2027-01-01 00:00:00 5       2027
1   2027-01-01 01:00:00 2       2027
2   2027-01-01 02:00:00 3       2027
3   2027-01-01 03:00:00 4       2027
4   2027-01-01 04:00:00 1       2027
... ... ... ...
298051  2060-12-31 19:00:00 1   2060
298052  2060-12-31 20:00:00 3   2060
298053  2060-12-31 21:00:00 2   2060
298054  2060-12-31 22:00:00 1   2060
298055  2060-12-31 23:00:00 3   2060

我想到了一种稍微不同的方法,我们可以做以下几点:

timeline = pd.date_range(start="2027-01-01",
              end="2061-01-01",
              freq="H")
timeline = timeline[:-1]

df1 = pd.DataFrame()

for i in range(0, 34):
    df2 = pd.DataFrame()
    df2['value'] = np.random.randint(1, 6, 8900)
    df2['year'] = 2027 + i
    df1 = pd.concat([df1, df2])
df1['Row'] = df1.groupby(['year']).cumcount()
timeline = timeline.to_frame()
timeline = timeline.rename(columns={(0):'date'})
timeline['year'] = timeline.date.dt.year
timeline['Row'] = timeline.groupby(['year']).cumcount()
然后在它们上面合并:

result = timeline.merge(df1, on=['year', 'Row'])

我相信这将强制执行行顺序。

我有点担心键匹配的顺序。如果我们在某个非唯一键上进行连接,那么只有左表的第一行与第一行与第二行的匹配项合并。此外,匹配的第一行不会用于第一列的第二行。也许我弄错了,但是我在文档中找不到它,文档记录了这种行为。我已经编辑了代码,以便根据条件进行合并。请检查它是否有效。