Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按行值比较数据帧列&;选择具有匹配图案的一个_Python_Pandas_Dataframe - Fatal编程技术网

Python 按行值比较数据帧列&;选择具有匹配图案的一个

Python 按行值比较数据帧列&;选择具有匹配图案的一个,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个df,如下所示: id1 id2 id3 id4 id5 9890 abc123 CI652 125 nan 156 CI951 9895 nan nan CI632 198 nan nan nan nan nan 145 nan CI258 9892 9893 nan nan nan abc556 nan abc887 na

我有一个df,如下所示:

id1     id2     id3    id4    id5

9890    abc123  CI652  125    nan

156     CI951   9895   nan    nan

CI632   198     nan    nan    nan

nan     nan     145    nan    CI258

9892    9893    nan    nan    nan

abc556  nan     abc887 nan    CI642
我想查看所有列并根据优先级选择一个值:

abc*>98*>除“nan”>nan以外的任何内容

根据选择的值,我想创建并填充一个新的df/列。预期产出如下:

id1     id2     id3    id4    id5    output

9890    abc123  CI652  125    nan    abc123

156     CI951   9895   nan    nan    9895

CI632   198     nan    nan    nan    CI632

nan     nan     145    nan    CI258  145

9892    9893    nan    nan    nan    9892

abc556  nan     abc887 nan    CI642  abc556
我的逻辑是使用for循环迭代df中的每一行,然后使用if-else逻辑按优先级比较值


有没有更好的方法来实现这一点?任何见解都将不胜感激。蒂亚

不确定这是否是最好的方法,您已尝试通过
startswith
检查条件,并根据您的优先级对其排序,然后使用
df。查找

m=df.astype(str)



  • 这是解决办法
  • 基本思想是对每一行使用apply函数
    (axis=0)
  • 与优先级匹配并返回

附:@anky_91的回答很好,也很简洁。这只是另一种方法。

对于
nan
您需要
nan作为字符串
nan
作为
np.nan
中的字符串,我正要朝这个方向走,除了我使用
stack
而不是
applymap
@QuangHoang抱歉回复晚了(当时正在开车回家),这是一个很好的建议,如果你愿意,你可以写一个答案:)@KrunalPatel没问题。快乐编码:)
c1=m.applymap(lambda x: x.startswith('abc'))*3
c2=m.applymap(lambda x: x.startswith('98'))*2
c3=df.notna().astype(int)
s=(c1+c2+c3).idxmax(1)
df=df.assign(output=df.lookup(s.index,s.values))
      id1     id2     id3    id4    id5  output
0    9890  abc123   CI652  125.0    NaN  abc123
1     156   CI951    9895    NaN    NaN    9895
2   CI632     198     NaN    NaN    NaN   CI632
3     NaN     NaN     145    NaN  CI258     145
4    9892    9893     NaN    NaN    NaN    9892
5  abc556     NaN  abc887    NaN  CI642  abc556
>>> import pandas as pd
>>> import numpy as np
>>> import re
>>> df = pd.DataFrame.from_dict({'a':['abc','2',np.nan,'23423af'], 'b':['98564','98ad456',np.nan,'ab23452fdsa']})
    a           b
0   abc        98564
1   2          98ad456
2   NaN        NaN
3   23423af    ab23452fdsa
>>> def isna(x): # helper function to check nan
    return x!=x

>>> def match_pattern(x): # your priority matching function
    for val in x:
        if isna(val):
            continue
        if re.match('^abc.*',val):
            return val
    for val in x:
        if isna(val):
            continue
        if re.match('^98.*',val):
            return val
    for val in x:
        if  not isna(val):
            return val
    return x[0]

>>> df['output']=df.apply(lambda x:match_pattern(x), axis=1)
>>> df
    a         b          output
0   abc     98564        abc
1   2       98ad456      98ad456
2   NaN     NaN          NaN
3   23423af ab23452fdsa  23423af