Python 有条件地增加某些行值-大熊猫
我有一个数据集,其中包含一列值,但是该列中的某些行包含异常值(如-9999999或9999999),这是由于我希望在Pandas中尝试更正的系统错误造成的 原始数据集如下所示:Python 有条件地增加某些行值-大熊猫,python,pandas,math,conditional-statements,Python,Pandas,Math,Conditional Statements,我有一个数据集,其中包含一列值,但是该列中的某些行包含异常值(如-9999999或9999999),这是由于我希望在Pandas中尝试更正的系统错误造成的 原始数据集如下所示: Value Column -2092.925951 910.9736 -910.9736 -2024.96475 -2024.96475 999947.438 - (outlier) 67.4672 -999993.313 - (outlier) 9.8603 49.5318 17.5591 我只想将1000000添加
Value Column
-2092.925951
910.9736
-910.9736
-2024.96475
-2024.96475
999947.438 - (outlier)
67.4672
-999993.313 - (outlier)
9.8603
49.5318
17.5591
我只想将1000000添加到数字介于-800000和-999999之间的行,并从数字介于800000和999999之间的行中减去1000000
所需数据集的示例如下:
Value Column
-2092.925951
910.9736
-910.9736
-2024.96475
-2024.96475
-52.562 - (fixed outlier with 999,947.438 - 1,000,000)
67.4672
6.687 - (fixed outlier with -999,993.313 + 1,000,000)
9.8603
49.5318
17.5591
任何帮助或想法都将不胜感激 将
值列
视为VC
(
df.assign(l=df['Value Column'].between(800000,999999)*-1000000)
.assign(s=df['Value Column'].between(-999999,-800000)*1000000)
.apply('sum', axis=1)
)
0 -2092.925951
1 910.973600
2 -910.973600
3 -2024.964750
4 -2024.964750
5 -52.562000
6 67.467200
7 6.687000
8 9.860300
9 49.531800
10 17.559100
dtype: float64
In [8]: l = [-2092.925951,910.9736,-910.9736,-2024.96475,-2024.96475,
...: 999947.438,67.4672,-999993.313,9.8603,49.5318,17.5591,]
In [9]: df = pd.DataFrame.from_dict({'VC':l})
In [10]: def check(value):
...: if value > 10000:
...: return value-1000000
...: elif value < -10000:
...: return -1000000-value
...: return value
...:
...: df['VC'] = df.apply(lambda row: check(row['VC']), axis=1)
...:
In [11]: df
Out[11]:
VC
0 -2092.925951
1 910.973600
2 -910.973600
3 -2024.964750
4 -2024.964750
5 -52.562000
6 67.467200
7 -6.687000
8 9.860300
9 49.531800
10 17.559100
[8]中的:l=[-2092.925951910.9736,-910.9736,-2024.96475,--2024.96475,
...: 999947.438,67.4672,-999993.313,9.8603,49.5318,17.5591,]
[9]中:df=pd.DataFrame.from_dict({'VC':l})
在[10]:def检查(值):
…:如果值>10000:
…:返回值-1000000
…:elif值<-10000:
…:返回-1000000值
…:返回值
...:
…:df['VC']=df.apply(lambda行:检查(行['VC']),轴=1)
...:
In[11]:df
出[11]:
风险投资
0 -2092.925951
1 910.973600
2 -910.973600
3 -2024.964750
4 -2024.964750
5 -52.562000
6 67.467200
7 -6.687000
8 9.860300
9 49.531800
10 17.559100
将值列
视为VC
In [8]: l = [-2092.925951,910.9736,-910.9736,-2024.96475,-2024.96475,
...: 999947.438,67.4672,-999993.313,9.8603,49.5318,17.5591,]
In [9]: df = pd.DataFrame.from_dict({'VC':l})
In [10]: def check(value):
...: if value > 10000:
...: return value-1000000
...: elif value < -10000:
...: return -1000000-value
...: return value
...:
...: df['VC'] = df.apply(lambda row: check(row['VC']), axis=1)
...:
In [11]: df
Out[11]:
VC
0 -2092.925951
1 910.973600
2 -910.973600
3 -2024.964750
4 -2024.964750
5 -52.562000
6 67.467200
7 -6.687000
8 9.860300
9 49.531800
10 17.559100
[8]中的:l=[-2092.925951910.9736,-910.9736,-2024.96475,--2024.96475,
...: 999947.438,67.4672,-999993.313,9.8603,49.5318,17.5591,]
[9]中:df=pd.DataFrame.from_dict({'VC':l})
在[10]:def检查(值):
…:如果值>10000:
…:返回值-1000000
…:elif值<-10000:
…:返回-1000000值
…:返回值
...:
…:df['VC']=df.apply(lambda行:检查(行['VC']),轴=1)
...:
In[11]:df
出[11]:
风险投资
0 -2092.925951
1 910.973600
2 -910.973600
3 -2024.964750
4 -2024.964750
5 -52.562000
6 67.467200
7 -6.687000
8 9.860300
9 49.531800
10 17.559100
这看起来也是一个不错的选择,我会尝试一下,谢谢!这看起来也是个不错的选择,我会试试的,谢谢!