Python 根据特定值重命名多个数据帧值
这是我的数据帧:Python 根据特定值重命名多个数据帧值,python,pandas,Python,Pandas,这是我的数据帧: A i j 0 O-20-003049 NaN NaN 1 1 0.643284 0.834937 2 2 0.056463 0.394168 3 3 0.773379 0.057465 4 4 0.081585 0.178991 5 5 0.667667
A i j
0 O-20-003049 NaN NaN
1 1 0.643284 0.834937
2 2 0.056463 0.394168
3 3 0.773379 0.057465
4 4 0.081585 0.178991
5 5 0.667667 0.004370
6 6 0.672313 0.587615
7 O-20-003104 NaN NaN
8 1 0.916426 0.739700
9 O-20-003117 NaN NaN
10 1 0.800776 0.614192
11 2 0.925186 0.980913
12 3 0.503419 0.775606
我想重命名A列中的值,以便得到以下结果:
A x y
0 O-20-003049.01 0.593312 0.666600
1 O-20-003049.02 0.554129 0.435650
2 O-20-003049.03 0.900707 0.623963
3 O-20-003049.04 0.023075 0.445153
4 O-20-003049.05 0.307908 0.503038
5 O-20-003049.06 0.844624 0.710027
6 O-20-003104.01 0.026914 0.091458
7 O-20-003117.01 0.275906 0.398993
8 O-20-003117.02 0.101117 0.691897
9 O-20-003117.03 0.739183 0.213401
这就是我到目前为止所拥有的(多亏了科雷恩的帮助)
运行此操作时,会收到以下错误消息:
无法使用包含NA/NaN值的非布尔数组进行掩码
然后我尝试将“==True”添加到布尔掩码:
mask = df1["A"].str.startswith("O-") == True
这将消除错误消息,但输出仍然不正确
A
0 O-21-002001.O-21-002001
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 O-21-002002.O-21-002002
7 NaN
8 NaN
任何想法都将不胜感激。让我们试试:
将numpy导入为np
作为pd进口熊猫
df=pd.DataFrame({'A':{0:'O-20-003049',1:'1',2:'2',3:'3',
4:'4',5:'5',6:'6',7:'O-20-003104',
8:'1',9:'O-20-003117',
10: '1', 11: '2', 12: '3'},
"i":{0:np.nan,1:0.643284,2:0.056463,,
3: 0.773379, 4: 0.081585, 5: 0.667667,
6:0.672313,7:np.nan,8:0.916426,
9:np.nan,10:0.800776,11:0.925186,
12: 0.503419},
"j":{0:np.nan,1:0.834937,2:0.394168,,
3: 0.057465, 4: 0.178991, 5: 0.00437,
6:0.587615,7:np.nan,8:0.7397,9:np.nan,
10: 0.614192, 11: 0.980913, 12: 0.775606}})
#图形中非数值的掩码
m=df['A'].str.isnumeric()
#将A中的非数值替换为NaN和FFILL
df['A']=np.where(m,np.NaN,df['A'])
df['A']=df['A'].ffill()
#过滤掉没有数据的行
df=df[m]。重置索引(drop=True)
#将第1/100个标记添加到
df['A']=df['A']+\
((df.groupby((~m.cumsum()).cumcount()+1)/100.astype(str)
打印(df)
df
:
A i j
0 O-20-0030490.01 0.643284 0.834937
1 O-20-0030490.02 0.056463 0.394168
2 O-20-0030490.03 0.773379 0.057465
3 O-20-0030490.04 0.081585 0.178991
4 O-20-0030490.05 0.667667 0.004370
5 O-20-0030490.06 0.672313 0.587615
6 O-20-0031040.07 0.916426 0.739700
7 O-20-0031170.01 0.800776 0.614192
8 O-20-0031170.02 0.925186 0.980913
9 O-20-0031170.01 0.503419 0.775606
您也可以使用np.where->np.where(df.A.str.isnumeric(),np.NAN,df.A)来提高性能您是正确的,np.where的性能明显更高。更新了我的答案。谢谢
A i j
0 O-20-0030490.01 0.643284 0.834937
1 O-20-0030490.02 0.056463 0.394168
2 O-20-0030490.03 0.773379 0.057465
3 O-20-0030490.04 0.081585 0.178991
4 O-20-0030490.05 0.667667 0.004370
5 O-20-0030490.06 0.672313 0.587615
6 O-20-0031040.07 0.916426 0.739700
7 O-20-0031170.01 0.800776 0.614192
8 O-20-0031170.02 0.925186 0.980913
9 O-20-0031170.01 0.503419 0.775606