使用python将dataframe中特定列中的特定值的所有列中的值更改为无
我有一个如下所示的数据帧:使用python将dataframe中特定列中的特定值的所有列中的值更改为无,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: time c1 c2 1 2017-07-23 11:39:10 3.385661 3.193302 2 2017-07-23 11:39:20 3.157000 2.912690 3 2017-07-23 11:39:30 3.277145 3.124290 4 2017-07-23 11:39:40 3.126075 2.982679 5 2017-07-23 11:39:50 3.135766 2
time c1 c2
1 2017-07-23 11:39:10 3.385661 3.193302
2 2017-07-23 11:39:20 3.157000 2.912690
3 2017-07-23 11:39:30 3.277145 3.124290
4 2017-07-23 11:39:40 3.126075 2.982679
5 2017-07-23 11:39:50 3.135766 2.985840
6 2017-07-23 11:40:00 3.166134 3.016147
7 2017-07-23 11:40:10 2.487507 2.256214
8 2017-07-23 11:40:20 3.348368 3.158728
9 2017-07-23 11:40:30 3.219001 2.996357
10 2017-07-23 11:40:40 2.862558 2.711170
11 2017-07-23 11:40:50 2.558438 2.346303
12 2017-07-23 11:41:00 3.338989 3.192018
13 2017-07-23 11:41:10 2.674149 2.496557
14 2017-07-23 11:41:20 3.523231 3.315889
15 2017-07-23 11:41:30 2.931527 2.740840
16 2017-07-23 11:41:40 3.078464 2.938004
问题1:如果C1和c2中的值介于时间列的两个特定时间范围之间,我想将其设置为无
对于问题1:我试图做的是,获取位于两个特定时间范围内的所有行的索引,然后更改值:
index_list = df.ds[(df.ds >= start_time) & (df.ds <= end_time)].index.tolist()
index\u list=df.ds[(df.ds>=start\u time)&(df.ds=start\u time)&(df['ds']您可以使用将值更改为NaN
,但实际上这是有问题的,还可以检查:
您可以使用forDatetimeIndex
,然后选择rows by并设置NaN
Replacefloat
值有点问题,因为精度有点高。因此,通过布尔掩码为Replace toNone
提供帮助功能:
#if necessary convert to datetime
#df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df.loc['2017-07-23 11:39:20':'2017-07-23 11:39:50'] = np.nan
df.loc['2017-07-23 11:40:20':'2017-07-23 11:40:50'] = np.nan
df = df.mask(np.isclose(df.values, 3.38566))
print (df)
c1 c2
time
2017-07-23 11:39:10 NaN 3.193302
2017-07-23 11:39:20 NaN NaN
2017-07-23 11:39:30 NaN NaN
2017-07-23 11:39:40 NaN NaN
2017-07-23 11:39:50 NaN NaN
2017-07-23 11:40:00 3.166134 3.016147
2017-07-23 11:40:10 2.487507 2.256214
2017-07-23 11:40:20 NaN NaN
2017-07-23 11:40:30 NaN NaN
2017-07-23 11:40:40 NaN NaN
2017-07-23 11:40:50 NaN NaN
2017-07-23 11:41:00 3.338989 3.192018
2017-07-23 11:41:10 2.674149 2.496557
2017-07-23 11:41:20 3.523231 3.315889
2017-07-23 11:41:30 2.931527 2.740840
2017-07-23 11:41:40 3.078464 2.938004
问题1我使用以下方法解决:
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
df.loc[(df['ds'] >= start_time) & (df['ds'] <= end_time), df.columns!= 'ds'] = None
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), df.columns!= 'time'] = None
start_time='2017-07-23 11:40:20'
结束时间='2017-07-23 11:40:50'
df.loc[(df['time']>=start_time)&(df['time']我不想将time列作为索引列。我也不想使用isclose()当我在寻找精确值时,你可以解释更多吗?如果使用float
s什么是精度?精确的数字。我会给出作为输入,所以不必担心精度。例如:假设输入是3.166134,那么在所有列中,除了前面提到的时间列之外,将3.166134设为无。我不想将时间列作为索引列。我无法取消重新理解问题2的解决方案。
#if necessary convert to datetime
#df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df.loc['2017-07-23 11:39:20':'2017-07-23 11:39:50'] = np.nan
df.loc['2017-07-23 11:40:20':'2017-07-23 11:40:50'] = np.nan
df = df.mask(np.isclose(df.values, 3.38566))
print (df)
c1 c2
time
2017-07-23 11:39:10 NaN 3.193302
2017-07-23 11:39:20 NaN NaN
2017-07-23 11:39:30 NaN NaN
2017-07-23 11:39:40 NaN NaN
2017-07-23 11:39:50 NaN NaN
2017-07-23 11:40:00 3.166134 3.016147
2017-07-23 11:40:10 2.487507 2.256214
2017-07-23 11:40:20 NaN NaN
2017-07-23 11:40:30 NaN NaN
2017-07-23 11:40:40 NaN NaN
2017-07-23 11:40:50 NaN NaN
2017-07-23 11:41:00 3.338989 3.192018
2017-07-23 11:41:10 2.674149 2.496557
2017-07-23 11:41:20 3.523231 3.315889
2017-07-23 11:41:30 2.931527 2.740840
2017-07-23 11:41:40 3.078464 2.938004
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), df.columns!= 'time'] = None