Python 将一列中的每个值与一个数据帧中另一列的每个值进行检查
我有以下数据帧:Python 将一列中的每个值与一个数据帧中另一列的每个值进行检查,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧: import pandas as pd dict = {'val1':["3.2", "2.4", "-2.3", "-4.9","0"], 'class': ["1", "0", "0", "0", "1"], 'val2':["3.2", "2.7", "1.7", "-7.1", "0"]} df = pd.DataFrame(dict) df val1 class val2 0 3.2 1 3
import pandas as pd
dict = {'val1':["3.2", "2.4", "-2.3", "-4.9","0"],
'class': ["1", "0", "0", "0", "1"],
'val2':["3.2", "2.7", "1.7", "-7.1", "0"]}
df = pd.DataFrame(dict)
df
val1 class val2
0 3.2 1 3.2
1 2.4 0 2.7
2 -2.3 0 1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
我想检查两件事:
1) 对于符号:如果列val1中记录的符号与列val2的符号不相同(例如:索引2处的值的符号不相同),则在这种情况下,将值2的符号更改为值1的符号。所需的输出如下:
val1 class val2
0 3.2 1 3.2
1 2.4 0 2.7
2 -2.3 0 -1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
2) 第二次检查:val2列中的值是否在val1列中的值+2和-2之间的间隔内。例如:索引2:2.4处的记录在[2.7+2:2.7-2]范围内。如果条件为true,则我希望将类从0更改为1。期望输出为:
val1 class val2
0 3.2 1 3.2
1 2.4 1 2.7
2 -2.3 1 -1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
如有必要,首先将值转换为浮点数,然后使用设置符号,然后用于第二次使用: 试试这个:
import numpy as np
# Check 1
df['val2'] = df.apply(lambda x: np.sign(x['val1']) * np.sign(x['val2']) * x['val2'], axis=1)
# Check 2
df['class'] = df.apply(lambda x: int(abs(x['val1'] - x['val2']) < 2) , axis=1)
将numpy导入为np
#检查1
df['val2']=df.apply(λx:np.符号(x['val1'])*np.符号(x['val2'])*x['val2'],轴=1)
#支票2
df['class']=df.apply(λx:int(abs(x['val1']-x['val2'])<2,轴=1)
我认为这将在不使用任何其他库的情况下解决您的查询:
def signfunc(x,y):
if x*y >= 0:
return y
else:
return -1*y
df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)
df['val2'] = df.apply(lambda x: signfunc(x.val1, x.val2), axis=1)
print(df)
df.loc[abs(df["val1"]-df["val2"])<=2, 'class'] = 1
print(df)
def signfunc(x,y):
如果x*y>=0:
返回y
其他:
返回-1*y
df['val1']=df['val1'].aType(浮点)
df['val2']=df['val2'].aType(浮点)
df['val2']=df.apply(λx:signfunc(x.val1,x.val2),轴=1)
打印(df)
df.loc[abs(df[“val1”]-df[“val2”])就我所见,如果val1是正的,val2是负的,它不会改变val2的符号。@Boendal,即使val1的符号是正的,val2的符号是负的。val2应该改变它的符号到val1的符号,如图所示:最后一行val2仍然是负的,并且是您的解决方案的副本。@Boendal-理解,然后需要使用df['val2']*=np.sign(df['val1'])*np.sign(df['val2'])
@Sascha-它是shorcut-它和df['val2']=df['val2']*np.sign(df['val1'])*np-sign(df['val2'])一样。
def signfunc(x,y):
if x*y >= 0:
return y
else:
return -1*y
df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)
df['val2'] = df.apply(lambda x: signfunc(x.val1, x.val2), axis=1)
print(df)
df.loc[abs(df["val1"]-df["val2"])<=2, 'class'] = 1
print(df)