Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为时间戳列创建Bin_Python_Python 3.x_Pandas_Data Science_Bins - Fatal编程技术网

Python 为时间戳列创建Bin

Python 为时间戳列创建Bin,python,python-3.x,pandas,data-science,bins,Python,Python 3.x,Pandas,Data Science,Bins,我正在尝试为时间戳间隔列创建一个适当的bin 使用诸如 df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00'])) 结果df如下所示: time_interval | bin 00:17:00 (0 days 00:10:00,

我正在尝试为时间戳间隔列创建一个适当的bin

使用诸如

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00']))
结果df如下所示:

time_interval  |           bin
  00:17:00        (0 days 00:10:00, 0 days 00:20:00]
  01:42:00                NaN
  00:15:00        (0 days 00:10:00, 0 days 00:20:00]
  00:00:00                NaN
  00:06:00        (0 days 00:00:00, 0 days 00:10:00]
这是一个有点偏离的结果,我想要的只是时间值,而不是天数,而且我希望上限或最后一个bin是60分钟或inf(或更多)

所需输出:

time_interval  |           bin
      00:17:00        (00:10:00,00:20:00]
      01:42:00        (00:60:00,inf]
      00:15:00        (00:10:00,00:20:00]
      00:00:00        (00:00:00,00:10:00]
      00:06:00        (00:00:00,00:10:00]

谢谢你的关注

在pandas
inf
中,时间增量不存在,因此使用了最大值。对于包含最小值,也使用参数
include_lowest=True
if want bin由timedelta填充:

b = pd.to_timedelta(['00:00:00','00:10:00','00:20:00',
                     '00:30:00','00:40:00',
                     '00:50:00','00:60:00'])
b = b.append(pd.Index([pd.Timedelta.max]))
df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b)
print (df)
  time_interval                                             Bin
0      00:17:00              (0 days 00:10:00, 0 days 00:20:00]
1      01:42:00  (0 days 01:00:00, 106751 days 23:47:16.854775]
2      00:15:00              (0 days 00:10:00, 0 days 00:20:00]
3      00:00:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
4      00:06:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
如果需要字符串而不是时间增量,则使用
zip
创建带有append的标签
'inf'

vals = ['00:00:00','00:10:00','00:20:00',
        '00:30:00','00:40:00', '00:50:00','00:60:00']

b = pd.to_timedelta(vals).append(pd.Index([pd.Timedelta.max]))

vals.append('inf')
labels = ['{}-{}'.format(i, j) for i, j in zip(vals[:-1], vals[1:])] 

df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b, labels=labels)
print (df)
  time_interval                Bin
0      00:17:00  00:10:00-00:20:00
1      01:42:00       00:60:00-inf
2      00:15:00  00:10:00-00:20:00
3      00:00:00  00:00:00-00:10:00
4      00:06:00  00:00:00-00:10:00

在pandas
inf
中,时间增量不存在,因此使用了最大值。对于包含最小值,也使用参数
include_lowest=True
if want bin由timedelta填充:

b = pd.to_timedelta(['00:00:00','00:10:00','00:20:00',
                     '00:30:00','00:40:00',
                     '00:50:00','00:60:00'])
b = b.append(pd.Index([pd.Timedelta.max]))
df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b)
print (df)
  time_interval                                             Bin
0      00:17:00              (0 days 00:10:00, 0 days 00:20:00]
1      01:42:00  (0 days 01:00:00, 106751 days 23:47:16.854775]
2      00:15:00              (0 days 00:10:00, 0 days 00:20:00]
3      00:00:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
4      00:06:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
如果需要字符串而不是时间增量,则使用
zip
创建带有append的标签
'inf'

vals = ['00:00:00','00:10:00','00:20:00',
        '00:30:00','00:40:00', '00:50:00','00:60:00']

b = pd.to_timedelta(vals).append(pd.Index([pd.Timedelta.max]))

vals.append('inf')
labels = ['{}-{}'.format(i, j) for i, j in zip(vals[:-1], vals[1:])] 

df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b, labels=labels)
print (df)
  time_interval                Bin
0      00:17:00  00:10:00-00:20:00
1      01:42:00       00:60:00-inf
2      00:15:00  00:10:00-00:20:00
3      00:00:00  00:00:00-00:10:00
4      00:06:00  00:00:00-00:10:00

你可以用标签来解决这个问题-

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00', '24:00:00']), labels=['(00:00:00,00:10:00]', '(00:10:00,00:20:00]', '(00:20:00,00:30:00]', '(00:30:00,00:40:00]', '(00:40:00,00:50:00]', '(00:50:00,00:60:00]', '(00:60:00,inf]'])

你可以用标签来解决这个问题-

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00', '24:00:00']), labels=['(00:00:00,00:10:00]', '(00:10:00,00:20:00]', '(00:20:00,00:30:00]', '(00:30:00,00:40:00]', '(00:40:00,00:50:00]', '(00:50:00,00:60:00]', '(00:60:00,inf]'])