Python 查找每行中最接近的列值-0

Python 查找每行中最接近的列值-0,python,python-3.x,pandas,Python,Python 3.x,Pandas,以下是一个较大数据集的示例: df_old = pd.DataFrame({'code': ['fea-1','fea-132','fea-223','fea-394','fea-595','fea-130','fea-495'], 'forecastWind_low':[20,15,0,45,45,25,45], 'forecastWind_high':['NaN' ,30,'NaN',55,65,35,'NaN'],

以下是一个较大数据集的示例:

df_old = pd.DataFrame({'code': ['fea-1','fea-132','fea-223','fea-394','fea-595','fea-130','fea-495'],
                   'forecastWind_low':[20,15,0,45,45,25,45],
                   'forecastWind_high':['NaN' ,30,'NaN',55,65,35,'NaN'],
                   'obs_windSpeed':[20,11,3,65,55,'NaN',55]})
我已经预测了风速,我需要将其与观测值进行比较。。。最终,我需要找到最接近观测风速值的预测速度(低或高),以获得如下输出:

df_new = pd.DataFrame({'code': ['fea-1','fea-132','fea-223','fea-394','fea-595','fea-130','fea-495'],
                   'forecastWind_low':[20,15,0,45,45,25,45],
                   'forecastWind_high':['NaN' ,30,'NaN',55,65,35,'NaN'],
                   'obs_windSpeed':[20,11,3,65,55,'NaN',55],
                   'nearest_forecast_windSpeed':[20,15,0,55,45,'NaN',45]})

创建自定义比较函数并跨行应用它

def check_speed_diff(high,low,obs):
    if np.isnan(obs):
        return np.nan
    elif np.isnan(high):
        return low
    elif np.isnan(low):
        return high
    
    if abs(high-obs)<abs(low-obs):
        return high
    else:
        return low

df_old.apply(lambda x: 
    check_speed_diff(
        x.forecastWind_high,
        x.forecastWind_low,
        x.obs_windSpeed
    ),
    axis=1
)
def检查速度差异(高、低、obs):
如果np.isnan(obs):
返回np.nan
elif np.isnan(高):
低回报
elif np.isnan(低):
回潮

如果abs(高obs)创建一个自定义比较函数并跨行应用它

def check_speed_diff(high,low,obs):
    if np.isnan(obs):
        return np.nan
    elif np.isnan(high):
        return low
    elif np.isnan(low):
        return high
    
    if abs(high-obs)<abs(low-obs):
        return high
    else:
        return low

df_old.apply(lambda x: 
    check_speed_diff(
        x.forecastWind_high,
        x.forecastWind_low,
        x.obs_windSpeed
    ),
    axis=1
)
def检查速度差异(高、低、obs):
如果np.isnan(obs):
返回np.nan
elif np.isnan(高):
低回报
elif np.isnan(低):
回潮

如果abs(高obs)这里有另一种方法来实现您所寻找的目标。它允许比较两个以上的列

col = ['forecastWind_low','forecastWind_high']
comparecol = ['obs_windSpeed']
df[col + comparecol] = df[col + comparecol].astype(float)
dfmerge =pd.merge(df[col].stack().reset_index(-1),df[comparecol],left_index=True,right_index=True,how='left')
dfmerge = dfmerge.rename(columns = {'level_1':'windforecast',0:'Amount'})
dfmerge['difference'] = abs(dfmerge['obs_windSpeed'] - dfmerge['Amount'])
dfmerge = dfmerge.sort_values(by='difference',ascending=True)
dfmerge = dfmerge.groupby(level=0).head(1)
df = pd.merge(df,dfmerge['Amount'],left_index=True,right_index=True,how='left')
df.loc[df['obs_windSpeed'].isna(),'Amount'] = np.nan

这里有另一种方法来实现你想要的。它允许比较两个以上的列

col = ['forecastWind_low','forecastWind_high']
comparecol = ['obs_windSpeed']
df[col + comparecol] = df[col + comparecol].astype(float)
dfmerge =pd.merge(df[col].stack().reset_index(-1),df[comparecol],left_index=True,right_index=True,how='left')
dfmerge = dfmerge.rename(columns = {'level_1':'windforecast',0:'Amount'})
dfmerge['difference'] = abs(dfmerge['obs_windSpeed'] - dfmerge['Amount'])
dfmerge = dfmerge.sort_values(by='difference',ascending=True)
dfmerge = dfmerge.groupby(level=0).head(1)
df = pd.merge(df,dfmerge['Amount'],left_index=True,right_index=True,how='left')
df.loc[df['obs_windSpeed'].isna(),'Amount'] = np.nan

修改Jeff的解决方案我设法想出了以下方法:

def check_speed_diff(high,low,obs):
    if obs == 'NaN':
        return np.nan
    if low != 'NaN' and high == 'NaN':
        return low
    if low == 'NaN' and high != 'NaN':
        return high
    if low != 'NaN' and high != 'NaN':
        if abs(high-obs)<abs(low-obs):
            return high
        else:
            return low
使用Jeff的建议应用函数:

df['nearest_forecastWindSpeed'] = df.apply(lambda x: check_speed_diff(
        x.forecast_WindSpeed_high, 
        x.forecast_WindSpeed_low,
        x.windSpeed),axis=1)

也许不是最有效的,但我完成了任务。。。谢谢大家的帮助。

修改Jeff的解决方案,我终于想出了以下方法:

def check_speed_diff(high,low,obs):
    if obs == 'NaN':
        return np.nan
    if low != 'NaN' and high == 'NaN':
        return low
    if low == 'NaN' and high != 'NaN':
        return high
    if low != 'NaN' and high != 'NaN':
        if abs(high-obs)<abs(low-obs):
            return high
        else:
            return low
使用Jeff的建议应用函数:

df['nearest_forecastWindSpeed'] = df.apply(lambda x: check_speed_diff(
        x.forecast_WindSpeed_high, 
        x.forecast_WindSpeed_low,
        x.windSpeed),axis=1)

也许不是最有效的,但我完成了任务。。。谢谢大家的帮助。

您尝试了什么?尝试了以下示例(),但我认为某些行中的附加非相关列和NaN/错误字符串是问题所在…请告诉我们您尝试了什么。这段代码通常很有用
df=df_old.fillna(0)
然后
df['nearest_forecast_windSpeed']=np.where(df.obs_windSpeed.sub(df.forecastWind_low)您尝试了什么?尝试了以下示例()但我认为一些行中的附加非相关列和NaN/错误字符串是问题所在……重要的是,请告诉我们您尝试过做什么。这段代码通常很有用。
df=df\u old.fillna(0)
然后
df['nearest\u forecast\u windSpeed']=np.where(df.obs\u windSpeed.sub(df.forecastWind\u low)如果您的
Nan
s实际上是字符串而不是
np.Nan
请检查这3个值是否是字符串,而不是在比较函数的开头检查它们是否是
np.Nan
。谢谢。您的回答真的帮助我弄明白了这一点。如果您的
Nan
s实际上是字符串而不是
np.Nan
请检查以太3个值是否为字符串,而不是比较函数开头的
np.nan
。谢谢。您的回答确实帮助我解决了这个问题。