Python 熊猫：从索引与另一列中的值相对应的列中选择_Python_Pandas_Dataframe

Python 熊猫：从索引与另一列中的值相对应的列中选择

python pandas dataframe

Python 熊猫：从索引与另一列中的值相对应的列中选择,python,pandas,dataframe,Python,Pandas,Dataframe,为这个糟糕的标题道歉假设我有两个关于现场采样位置的数据帧。DF1包含样本ID、坐标、记录年份等。DF2包含气象变量，以列形式提供每年的值： import pandas as pd df1 = pd.DataFrame(data = {'ID': [10, 20, 30], 'YEAR': [1980, 1981, 1991]}, index=[1,2,3]) df2 = pd.DataFrame(data= np.random.randint(0,100,size=(3, 10)), colu

为这个糟糕的标题道歉

假设我有两个关于现场采样位置的数据帧。DF1包含样本ID、坐标、记录年份等。DF2包含气象变量，以列形式提供每年的值：

import pandas as pd
df1 = pd.DataFrame(data = {'ID': [10, 20, 30], 'YEAR': [1980, 1981, 1991]}, index=[1,2,3])
df2 = pd.DataFrame(data= np.random.randint(0,100,size=(3, 10)), columns=['year_{0}'.format(x) for x in range(1980, 1991)], index=[10, 20, 30])

print(df1)
>   ID YEAR
  1 10 1980
  2 20 1981
  3 30 1991

print(df2)
>    year_1980 year_1981 ... year_1990
  10 48 61 ... 53
  20 68 69 ... 21
  30 76 37 ... 70

注意

DF1

中的绘图ID如何对应

DF2.索引

，以及

DF1

采样年份如何超出

DF2

的覆盖范围。我想将DF2中与DF1中的

year

列对应的值作为一个新列添加到DF1中。到目前为止，我得到的是：

def grab(df, plot_id, yr):
    try:
        out = df.loc[plot_id, 'year_{}'.format(yr)]
    except KeyError:
        out = -99
    return out

df1['meteo_val'] = df1.apply(lambda row: grab(df2, row.index, row.year), axis=1)
print(df1)
>   ID YEAR meteo_val
  1 10 1980 48
  2 20 1981 69 
  3 30 1991 -99

这是可行的，但似乎需要很长时间来计算。我想知道一个更聪明、更快的方法来解决这个问题。有什么建议吗？

设置

np.random.seed(0) df1 = pd.DataFrame(data = {'ID': [10, 20, 30], 'YEAR': [1980, 1981, 1991]}, index=[1,2,3]) df2 = pd.DataFrame(data= np.random.randint(0,100,size=(3, 11)), columns=['year_{0}'.format(x) for x in range(1980, 1991)], index=[10, 20, 30])
解决方案包括：
与和的备选方案

mapper = df1.assign(YEAR = ('year_' + df1['YEAR'].astype(str))) c2 = mapper['ID'].isin(df2.index) c1 = mapper['YEAR'].isin(df2.columns) mapper = mapper.loc[c1 & c2] df1.loc[c2&c1, 'meteo_val'] = df2.lookup(mapper['ID'], mapper['YEAR']) df1 ['meteo_val'] = df1['meteo_val'].fillna(-99) ID YEAR meteo_val 1 10 1980 44.0 2 20 1981 88.0 3 30 1991 -99.0

df1 = df1.join(df2.set_axis(df2.columns.str.split('_').str[1].astype(int), axis=1).stack().rename('meteo_val'), on = ['ID', 'YEAR'], how='left').fillna(-99)