Python 转换时间序列数组中具有周期的数据帧_Python_Pandas

Python 转换时间序列数组中具有周期的数据帧

python pandas

Python 转换时间序列数组中具有周期的数据帧,python,pandas,Python,Pandas,我有这个问题，我正试图用这种结构转换数据帧（从具有数百万行的CSV文件加载）： | start | end | type | value | |---------------------|---------------------|------|-------| | 2016-01-01 00:00:00 | 2016-01-02 00:00:00 | 0 | 200 | | 2016-01-02 01:00:00 | 201

我有这个问题，我正试图用这种结构转换数据帧（从具有数百万行的CSV文件加载）：

| start               | end                 | type | value |
|---------------------|---------------------|------|-------|
| 2016-01-01 00:00:00 | 2016-01-02 00:00:00 | 0    | 200   |
| 2016-01-02 01:00:00 | 2016-01-03 00:00:00 | 1    | 100   |
| 2016-01-15 08:00:00 | 2016-01-16 07:00:00 | 0    | 15    |
| 2016-01-16 07:00:00 | 2016-01-16 07:00:00 | 2    | 80    |

我想将其转换为以下格式的结构：

| timestamp           | 0   | 1   | 2 |
|---------------------|-----|-----|---|
| 2016-01-01 00:00:00 | 200 | 0   | 0 |
| ...                 | 200 | 0   | 0 |
| 2016-01-02 00:00:00 | 200 | 0   | 0 |
| 2016-01-02 01:00:00 | 0   | 100 | 0 |
| ...                 | 0   | 100 | 0 |
| 2016-01-03 00:00:00 | 0   | 100 | 0 |
| ...                 | 0   | 0   | 0 |
| 2016-01-15 08:00:00 | 15  | 0   | 0 |

换句话说，虽然第一个表用其

值指定了类型
N事件的开始和结束时段，但我希望在最后有一个表，其中每个表都有一个日期时间范围a，其中包括所有事件的值
我正试图找到一个有效的解决方案，我找到的最佳解决方案是将日期时间转换为整数（使用自基准日期起的小时数），然后将此值用作numpy
数组的索引。不幸的是，我的代码使用了for循环，我想知道您是否能想出更好的方法
将熊猫作为pd导入
将numpy作为np导入
#示例数据帧
df=pd.数据帧（{'start'：['2016-01-01 00:00:00'，'2016-01-02 01:00:00'，'2016-01-15 08:00:00'，'2016-01-16 07:00:00']，
‘结束’：[‘2016-01-02 00:00:00’、‘2016-01-03 00:00:00’、‘2016-01-16 07:00:00’、‘2016-01-16 07:00:00’，
'id'：[0,1,0,2]，
'x'：[20010015,80]}）
#转换日期时间中的字符串
df['start']=pd.to_datetime（df['start']，格式=“%Y-%m-%d%H:%m:%S”）
df['end']=pd.to_datetime（df['end']，格式=“%Y-%m-%d%H:%m:%S”）
#获取日期时间偏移量
OFFSET=pd.datetime（2016,1,1,0,0,0）.timestamp（）#这是我的第一个日期时间
#将日期转换为整数（先转换为纳秒，然后再转换为小时
df['start']=（df['start'].astype（np.int64）/（1e9）-偏移量）/3600.astype（np.int32）-1
df['end']=（df['end'].astype（np.int64）/（1e9）-偏移量）/3600.astype（np.int32）-1
#目标数据结构
x=np.zeros（（1000，3））#其行数必须等于时间戳数
#将数据放入目标结构中
对于范围（0,3）内的i：
x[df.iloc[i]。开始：df.iloc[i]。结束，df.iloc[i]。id]=df.iloc[i]。x

从datetime到integer的转换基于。
我在Python方面的经验有限（我大部分是R用户），因此我希望有一个更好（矢量化？）和更优雅的解决方案
提前谢谢！
我将使用date\u range
在新列new
中创建所有datetime，然后使用和pivot\u表

df['New']=[pd.date_range(x,y,freq='H') for x , y in zip(df.start,df.end)]
yourdf=unnesting(df,['New']).pivot_table(values='x',index='New',columns='id',aggfunc='sum',fill_value=0)
yourdf.head()
Out[327]: 
id                     0    1   2
New                              
2016-01-01 00:00:00  200    0   0
2016-01-01 01:00:00  200    0   0
2016-01-01 02:00:00  200    0   0
2016-01-01 03:00:00  200    0   0