Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在另一列中搜索当前列的值,然后显示它';熊猫中与之相邻的另一列中的id是多少?_Python_Pandas - Fatal编程技术网

Python 如何在另一列中搜索当前列的值,然后显示它';熊猫中与之相邻的另一列中的id是多少?

Python 如何在另一列中搜索当前列的值,然后显示它';熊猫中与之相邻的另一列中的id是多少?,python,pandas,Python,Pandas,在一个数据帧中有4列col1,col1_id,col2,col2_id,我想在col_1中找到col_2值,然后如果有任何匹配,相应的col1_id应该附加到col2_id col_1 col1_id col_2 col2_id A 1 NaN NaN B 2 K NaN D 3 A NaN J 4 NaN NaN E

在一个数据帧中有4列col1,col1_id,col2,col2_id,我想在col_1中找到col_2值,然后如果有任何匹配,相应的col1_id应该附加到col2_id

 col_1  col1_id col_2  col2_id
    A        1   NaN      NaN
    B        2     K      NaN
    D        3     A      NaN
    J        4   NaN      NaN
    E        5     H      NaN
    Z        6   NaN      NaN
    H        7     H      NaN
    K        8     Z      NaN

有什么帮助吗???,谢谢有两种可能的解决方案,第一种的输出看起来更好

我想您需要使用字典
d
创建列
col\u 1
col1\u id

d = df[['col_1','col1_id']].set_index('col_1').to_dict()
print d
{'col1_id': {'A': 1, 'B': 2, 'E': 5, 'D': 3, 'H': 7, 'K': 8, 'J': 4, 'Z': 6}}

df['col2_id'] = df.col_2.map(d['col1_id'])
print df
  col_1  col1_id col_2  col2_id
0     A        1   NaN      NaN
1     B        2     K      8.0
2     D        3     A      1.0
3     J        4   NaN      NaN
4     E        5     H      7.0
5     Z        6   NaN      NaN
6     H        7     H      7.0
7     K        8     Z      6.0
或者,您也可以使用:

计时

def pil(df):
    df = df.set_index('col_1')
    df['col2_id'] = df.col_2.apply(lambda x: x if pd.isnull(x) else df.loc[x, 'col1_id'])
    return df.reset_index()

def jez(df):
    df['col2_id'] = df.col_2.map(df.set_index('col_1').to_dict()['col1_id'])
    return df

print pil(df1)
print jez(df)

In [34]: %timeit jez(df)
1000 loops, best of 3: 1.48 ms per loop

In [35]: %timeit pil(df1)
The slowest run took 4.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.56 ms per loop

有两种可能的解决方案,第一种看起来更好

我想您需要使用字典
d
创建列
col\u 1
col1\u id

d = df[['col_1','col1_id']].set_index('col_1').to_dict()
print d
{'col1_id': {'A': 1, 'B': 2, 'E': 5, 'D': 3, 'H': 7, 'K': 8, 'J': 4, 'Z': 6}}

df['col2_id'] = df.col_2.map(d['col1_id'])
print df
  col_1  col1_id col_2  col2_id
0     A        1   NaN      NaN
1     B        2     K      8.0
2     D        3     A      1.0
3     J        4   NaN      NaN
4     E        5     H      7.0
5     Z        6   NaN      NaN
6     H        7     H      7.0
7     K        8     Z      6.0
或者,您也可以使用:

计时

def pil(df):
    df = df.set_index('col_1')
    df['col2_id'] = df.col_2.apply(lambda x: x if pd.isnull(x) else df.loc[x, 'col1_id'])
    return df.reset_index()

def jez(df):
    df['col2_id'] = df.col_2.map(df.set_index('col_1').to_dict()['col1_id'])
    return df

print pil(df1)
print jez(df)

In [34]: %timeit jez(df)
1000 loops, best of 3: 1.48 ms per loop

In [35]: %timeit pil(df1)
The slowest run took 4.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 2.56 ms per loop
尝试:

尝试:


在我看来,这个问题看起来像是RDBMS中的标准任务。所以您可以使用merge()


在我看来,这个问题看起来像是RDBMS中的标准任务。所以您可以使用merge()


请检查解决方案,如果输出可能不同,您可以添加所需的问题。谢谢。请检查解决方案,如果输出可能不同,您可以添加所需的问题。谢谢
df['col2_id'] = pd.merge(df, df[['col1', 'col1_id']], left_on='col2', right_on='col1', how='left')['col1_id_y']