Python 当列中出现特定类型的值时,删除行
我有一个像这样的DFPython 当列中出现特定类型的值时,删除行,python,pandas,Python,Pandas,我有一个像这样的DF UNIT EXITSn_hourly Interval 1867 R081 104 00:00:00-04:00:00 1868 R081 0 04:00:00-04:00:00 1869 R081 129 04:00:00-08:00:00 1870 R081 521 08:00:00-12:00:00
UNIT EXITSn_hourly Interval
1867 R081 104 00:00:00-04:00:00
1868 R081 0 04:00:00-04:00:00
1869 R081 129 04:00:00-08:00:00
1870 R081 521 08:00:00-12:00:00
1871 R081 1048 12:00:00-16:00:00
2838 R032 38 00:00:00-04:00:00
2839 R032 0 04:00:00-04:00:00
2840 R032 89 04:00:00-08:00:00
2841 R032 470 08:00:00-12:00:00
当Interval具有此特定格式时,我需要删除整行
1868 R081 0 04:00:00-04:00:00
我不仅要删除04:00:00-04:00:00
,还要删除类似的值,如
01:00:00-01:00:00
实际上这是我原来的df。我创造了一个间歇
C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn
0 A002 R051 02-00-00 06-29-13 00:00:00 REGULAR 4174592 1433672
1 A002 R051 02-00-00 06-29-13 04:00:00 REGULAR 4174628 1433675
2 A002 R051 02-00-00 06-29-13 08:00:00 REGULAR 4174641 1433706
3 A002 R051 02-00-00 06-29-13 12:00:00 REGULAR 4174741 1433775
4 A002 R051 02-00-00 06-29-13 16:00:00 REGULAR 4174936 1433826
5 A002 R051 02-00-00 06-29-13 20:00:00 REGULAR 4175270 1433877
6 A002 R051 02-00-00 06-30-13 00:00:00 REGULAR 4175403 1433908
7 A002 R051 02-00-00 06-30-13 04:00:00 REGULAR 4175441 1433914
8 A002 R051 02-00-00 06-30-13 08:00:00 REGULAR 4175457 1433928
9 A002 R051 02-00-00 06-30-13 12:00:00 REGULAR 4175520 1433981
我使用此代码创建了interval
import copy
df = copy.deepcopy(turnstile_data)
pdf = df.shift(periods=1)
df['ENTRIESn_hourly'] = df['ENTRIESn'] - pdf['ENTRIESn'].fillna(0)
df['EXITSn_hourly'] = df['EXITSn'] - pdf['EXITSn'].fillna(0)
df['Interval'] = pdf['TIMEn']+'-'+ df['TIMEn'].fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != pdf['C/A']) | (df['UNIT'] != pdf['UNIT']) | (df['SCP'] != pdf['SCP']), ['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0
df = df[df.Interval != 0]
print df.head(20)
head7=copy.deepcopy(df)
required_df=head7[['UNIT','EXITSn_hourly','Interval']].groupby(head7.UNIT)
print required_df.head(5)
您可能希望将间隔拆分为间隔\u开始和间隔\u结束,并检查它们是否相等:
df['Interval_start'] = df['Interval'].map(lambda s: s.split('-')[0])
df['Interval_end'] = df['Interval'].map(lambda s: s.split('-')[1])
df.query("Interval_start != Interval_end")
UNIT EXITSn_hourly Interval Interval_start Interval_end
1867 R081 104 00:00:00-04:00:00 00:00:00 04:00:00
1869 R081 129 04:00:00-08:00:00 04:00:00 08:00:00
1870 R081 521 08:00:00-12:00:00 08:00:00 12:00:00
1871 R081 1048 12:00:00-16:00:00 12:00:00 16:00:00
2838 R032 38 00:00:00-04:00:00 00:00:00 04:00:00
2840 R032 89 04:00:00-08:00:00 04:00:00 08:00:00
2841 R032 470 08:00:00-12:00:00 08:00:00 12:00:00
您可以比较字符串的各个部分,然后按子集删除它们:
print df.Interval.str[0:2]
1867 00
1868 04
1869 04
1870 08
1871 12
2838 00
2839 04
2840 04
2841 08
Name: Interval, dtype: object
print df.Interval.str[0:2] != df.Interval.str[9:11]
1867 True
1868 False
1869 True
1870 True
1871 True
2838 True
2839 False
2840 True
2841 True
Name: Interval, dtype: bool
print df[df.Interval.str[0:2] != df.Interval.str[9:11]]
UNIT EXITSn_hourly Interval
1867 R081 104 00:00:00-04:00:00
1869 R081 129 04:00:00-08:00:00
1870 R081 521 08:00:00-12:00:00
1871 R081 1048 12:00:00-16:00:00
2838 R032 38 00:00:00-04:00:00
2840 R032 89 04:00:00-08:00:00
2841 R032 470 08:00:00-12:00:00
编辑:
我检查您的代码,也许您可以省略copy.deepcopy
并使用:
这是有效的方法还是有更好的方法
df = turnstile_data.copy(deep=True)
df['ENTRIESn_hourly'] = (df['ENTRIESn'] - df['ENTRIESn'].shift(periods=1)).fillna(0)
df['EXITSn_hourly'] = (df['EXITSn'] - df['EXITSn'].shift(periods=1)).fillna(0)
df['Interval'] = (df['TIMEn'].shift(periods=1)+'-'+ df['TIMEn']).fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != df['C/A'].shift(periods=1)) |
(df['UNIT'] != df['UNIT'].shift(periods=1)) |
(df['SCP'] != df['SCP'].shift(periods=1)),
['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0
print df.head(5)
ENTRIESn_hourly EXITSn_hourly Interval
0 0 0 0
1 36 3 00:00:00-04:00:00
2 13 31 04:00:00-08:00:00
3 100 69 08:00:00-12:00:00
4 195 51 12:00:00-16:00:00
required_df=df[['UNIT','EXITSn_hourly','Interval']].groupby(df.UNIT)
print required_df.head(5)
UNIT EXITSn_hourly Interval
0 R051 0 0
1 R051 3 00:00:00-04:00:00
2 R051 31 04:00:00-08:00:00
3 R051 69 08:00:00-12:00:00
4 R051 51 12:00:00-16:00:00