Python 3.x 在Python中跨多个行和列生成条件时间增量

Python 3.x 在Python中跨多个行和列生成条件时间增量,python-3.x,pandas,numpy,datetime,timedelta,Python 3.x,Pandas,Numpy,Datetime,Timedelta,我正在处理天气数据,并试图计算与我的时间序列中每小时观测值相对应的白天分钟数 London = pd.read_csv(root_dir + 'London.csv', usecols=['date_time','London_sunrise','London_sunset'], parse_dates=['date_time']) London.set_index(London['date_time'],

我正在处理天气数据,并试图计算与我的时间序列中每小时观测值相对应的白天分钟数

London = pd.read_csv(root_dir + 'London.csv',
                     usecols=['date_time','London_sunrise','London_sunset'], 
                     parse_dates=['date_time'])

London.set_index(London['date_time'], inplace =True)

London['London_sunrise'] = pd.to_datetime(London['London_sunrise']).dt.strftime('%H:%M')
London['London_sunset'] = pd.to_datetime(London['London_sunset']).dt.strftime('%H:%M')
London['time'] = pd.to_datetime(London['date_time']).dt.strftime('%H:%M')

London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], '60', '0')

London.head(6)
数据帧:


date_time               time            London_sunrise  London_sunset   London_sun_mins     
2019-05-21 00:00:00     00:00           05:01           20:54           0
2019-05-21 01:00:00     01:00           05:01           20:54           0
2019-05-21 02:00:00     02:00           05:01           20:54           0
2019-05-21 03:00:00     03:00           05:01           20:54           0
2019-05-21 04:00:00     04:00           05:01           20:54           0
2019-05-21 05:00:00     05:00           05:01           20:54           0
2019-05-21 06:00:00     06:00           05:01           20:54           60
我尝试了条件参数来生成每小时的日照分钟数,即:如果一个完整的日照小时数为60,如果是夜间,则为0

当我尝试使用timedelta生成日出和时间(即05:00和05:01)之间的差值时,预期的输出不会返回(59)

一个简单的例子:
London['London_sun_mins']=np.where(London['time']>=London['London_sunrise'],'60','0')

但是,当我尝试扩展到以下内容时,会接近所需的输出:

London['London_sun_mins']=np.where(London['time']>=London['London_sunrise'],London['time']]-London['London_sunrise'],'0')

返回以下错误:
不支持的操作数类型-:“str”和“str”

此外,当延伸到包含日出和日落时:

London['sunlightmins'] = London[(London['London_sunrise'] >= London['date_time'] & London['London_sunset'] <= London['date_time'])]

返回相同的错误。感谢您为达到预期产量所提供的一切帮助

我建议使用datetime类型,这样您就可以直接使用差异。事实上,您已经将小时数转换为字符串,因此当您尝试减去小时数时,会出现此错误。 但如果您有datetime变量,则可以按如下方式直接减去它们:

# First I reproduce you dataset 
import pandas as pd
London = pd.DataFrame({"date_time": pd.date_range("2019-05-21", periods=7, freq = "H"),
                   "London_sunrise" : "05:01",
                   "London_sunset" : "20:54"})
# I extract the date from date_time
London["date"] = London["date_time"].dt.date
# Then I create a datetime variable for sunrise and sunset with the same date 
# as my date_time variable and the hour from London_sunset and London_sunrise
London["sunrise_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunrise"] + ":00", 1)
London["sunset_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunset"] + ":00", 1)
# I transform them to datetime
London['sunrise_dtime'] = pd.to_datetime(London['sunrise_dtime'])
London['sunset_dtime'] = pd.to_datetime(London['sunset_dtime'])

# Then I can substract the two datetimes:
London['London_sun_mins'] = np.where(London['date_time']>=London['sunrise_dtime'],
                                     London['date_time'] - London['sunrise_dtime'], 0)
结果如下:

           date_time London_sunrise  ...        sunset_dtime London_sun_mins
0 2019-05-21 00:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
1 2019-05-21 01:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
2 2019-05-21 02:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
3 2019-05-21 03:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
4 2019-05-21 04:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
5 2019-05-21 05:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
6 2019-05-21 06:00:00          05:01  ... 2019-05-21 20:54:00        00:59:00

希望能有所帮助

我建议继续使用datetime类型,这样您就可以直接使用差异。事实上,您已经将小时数转换为字符串,因此当您尝试减去小时数时,会出现此错误。 但如果您有datetime变量,则可以按如下方式直接减去它们:

# First I reproduce you dataset 
import pandas as pd
London = pd.DataFrame({"date_time": pd.date_range("2019-05-21", periods=7, freq = "H"),
                   "London_sunrise" : "05:01",
                   "London_sunset" : "20:54"})
# I extract the date from date_time
London["date"] = London["date_time"].dt.date
# Then I create a datetime variable for sunrise and sunset with the same date 
# as my date_time variable and the hour from London_sunset and London_sunrise
London["sunrise_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunrise"] + ":00", 1)
London["sunset_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunset"] + ":00", 1)
# I transform them to datetime
London['sunrise_dtime'] = pd.to_datetime(London['sunrise_dtime'])
London['sunset_dtime'] = pd.to_datetime(London['sunset_dtime'])

# Then I can substract the two datetimes:
London['London_sun_mins'] = np.where(London['date_time']>=London['sunrise_dtime'],
                                     London['date_time'] - London['sunrise_dtime'], 0)
结果如下:

           date_time London_sunrise  ...        sunset_dtime London_sun_mins
0 2019-05-21 00:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
1 2019-05-21 01:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
2 2019-05-21 02:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
3 2019-05-21 03:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
4 2019-05-21 04:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
5 2019-05-21 05:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
6 2019-05-21 06:00:00          05:01  ... 2019-05-21 20:54:00        00:59:00
希望能有帮助