Python 检查DataFrame中的第n个值是否等于字符串中的第n个字符_Python_Pandas_Numpy_Data Structures_Data Science

Python 检查DataFrame中的第n个值是否等于字符串中的第n个字符

python pandas numpy data-structures

Python 检查DataFrame中的第n个值是否等于字符串中的第n个字符,python,pandas,numpy,data-structures,data-science,Python,Pandas,Numpy,Data Structures,Data Science,我有一个df： df = c1 c2 c3 c4 c5 0 K 6 nan Y V 1 H nan g 5 nan 2 U B g Y L 还有一根绳子 s = 'HKg5' 我想返回行，其中s[0]=c1的值，s[1]=c2的值，…+在某些情况下，s[i]=nan 例如，上面df中的第1行与字符串匹配 row 1= c1 c2 c3 c4 c5

我有一个df：

df =
     c1  c2   c3   c4  c5
  0  K   6    nan  Y   V
  1  H   nan  g    5   nan
  2  U   B    g    Y   L

还有一根绳子

s = 'HKg5'

我想返回行，其中s[0]=c1的值，s[1]=c2的值，…+在某些情况下，s[i]=nan

例如，上面df中的第1行与字符串匹配

    row 1=
           c1  c2   c3   c4  c5
        1  H   nan  g    5   nan
                                                match=True,   regardless of s[1,4]=nan
     s   = H   K    g    5

而且字符串长度是动态的，所以我的df cols在c10以上

我正在使用df.apply，但我不能清楚地理解它。我想编写一个传递给df.apply的函数，同时传递字符串

谢谢你的帮助

Chris答案的输出

  df=  
        c1  c2  c3  c4  c5 
     0  K   6  NaN  Y   V
     1  H  NaN  g   5  NaN
     2  U   B   g   Y   L

  s = 'HKg5'
  s1 = pd.Series(list(s), index=[f'c{x+1}' for x in range(len(s))])
  df.loc[((df == s1) | (df.isna())).all(1)]

输出

  `c1  c2  c3  c4  c5`

从字符串创建一个帮助程序

系列

，并使用布尔逻辑进行筛选：

s1 = pd.Series(list(s), index=[f'c{x+1}' for x in range(len(s))])

# print(s1)    
# c1    H
# c2    K
# c3    g
# c4    5
# dtype: object

逻辑为

df

等于（

）此值或（
）为nan（
isna
）
沿轴1使用
all
返回所有值均为
True的行 df.loc[((df == s1) | (df.isna())).all(1)] [外] 因此，作为一项功能，您可以： def df_match_string(frame, string): s1 = pd.Series(list(string), index=[f'c{x+1}' for x in range(len(string))]) return ((frame == s1) | (frame.isna())).all(1) df_match_string(df, s) [外] 更新我无法用提供的例子再现你的问题。我猜数据帧中的一些值可能有前导/尾随空格在尝试上述解决方案之前，请尝试以下预处理步骤： for col in df: df[col] = df[col].str.strip() 您好，我这里有一个小问题，有一个警告：-->FutureWarning:elementwise比较失败；而是返回标量，但将来将执行元素比较结果=方法（y）。返回不正确看起来只是一个警告，是由numpy 中的错误引起的检查此答案。如果返回不正确，您能否提供一个不按预期工作的行的可复制示例-具有预期结果？请检查我编辑了我的原始帖子并包含了您的输出，它返回一个空df@Chris AI无法复制该问题，代码正在使用此示例为我工作。我唯一能想到的是，你们中的一些列可能有前导或尾随空格。。？例如，在c1 的1行中的值实际上是“H”（注意H后面的空格） 0 False 1 True 2 False dtype: bool for col in df: df[col] = df[col].str.strip()