Python 如何从数据帧中删除负时间?
我已经在pandas中创建了一个数据框,显示了完成工作订单所需的总时间的计算。由于人为输入错误,一些时间计算结果显示为负时间,如第30行所示,即使我将am切换到PM,也会给出错误的时间,因为工作时间在07:30-16:00之间,最好忽略这些时间Python 如何从数据帧中删除负时间?,python,pandas,datetime,dataframe,Python,Pandas,Datetime,Dataframe,我已经在pandas中创建了一个数据框,显示了完成工作订单所需的总时间的计算。由于人为输入错误,一些时间计算结果显示为负时间,如第30行所示,即使我将am切换到PM,也会给出错误的时间,因为工作时间在07:30-16:00之间,最好忽略这些时间 Work Order WorkType AST AFT comp_time 10 BAEBRO-898690 RM 1900-01-01 06:27:41 1900-01-0
Work Order WorkType AST AFT comp_time
10 BAEBRO-898690 RM 1900-01-01 06:27:41 1900-01-01 08:05:28 01:37:47
13 BAEBRO-914693 RM 1900-01-01 08:30:00 1900-01-01 09:00:00 00:30:00
27 BAEBRO-898787 RM 1900-01-01 10:00:00 1900-01-01 10:30:00 00:30:00
30 BAEBRO-914680 RM 1900-01-01 14:32:08 1900-01-01 10:37:17 -1 days +20:05:09
37 BAEBRO-914660 RM 1900-01-01 10:47:39 1900-01-01 11:32:02 00:44:23`
我获得此结果的代码是:
import pandas as pd
from datetime import time
from datetime import timedelta
from pandas import DataFrame
import matplotlib as plt
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work.head()
下面的代码将适用于您:
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work = rm_work[rm_work.comp_time >= pd.Timedelta(0)] # Filtering condition
rm_work.head()
您需要使用适当的数据类型进行比较,在本例中为Timedelta。使用。应用以检查pandas时间是否为负值。请确保与pd.Timedelta0进行比较,而不仅仅是0,因为那样会出错。如果为负数,则返回numpy NaN。最后,排除带有NAN的行 如果您的专栏中已经有NAN并且希望保留它们,这将导致问题!在这种情况下,您可以更改方法以返回其他内容,然后排除该唯一值
def check_if_negative(pd_time):
if pd_time >= pd.Timedelta(0): # positive time and 0 time
return pd_time
elif pd_time < pd.Timedelta(0): # negative time
return np.NaN
else:
print(f'problem! {pd_time} has an issue') # quick error check
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST'] # create timedelta
rm_work['comp_time'] = rm_work.comp_time.apply(check_if_negative) # apply check to column
rm_work = rm_work.dropna(subset=['comp_time']) # delete rows with NaN
你可以过滤掉负值:rm_work=rm_work[rm_work.com_time>=0]我输入了这段代码并得到:AttributeError:“DataFrame”对象没有属性“com_time”,它是comp_time,@PaulLane。抱歉我应该注意到了。我现在得到:TypeError:无法将TimedeltaIndex与int类型进行比较