Python 使用查询函数返回两个列表相交的行_Python_Pandas

Python 使用查询函数返回两个列表相交的行

python pandas

Python 使用查询函数返回两个列表相交的行,python,pandas,Python,Pandas,我有这个df： pd.DataFrame([[1, "type_1"], [2, "type_2"], [2, "type_1; type_2"], [2, "type_1; type_3"], [2, "type_3"], [2, "type_1; type_2, type_3"]], columns=["a", "b"]) a b 0 1 type_1 1 2 type_2 2 2 type_1; type_2 3

我有这个df：

pd.DataFrame([[1, "type_1"], [2, "type_2"], [2, "type_1; type_2"], [2, "type_1; type_3"], [2, "type_3"], [2, "type_1; type_2, type_3"]],
                     columns=["a", "b"])
    a   b
0   1   type_1
1   2   type_2
2   2   type_1; type_2
3   2   type_1; type_3
4   2   type_3
5   2   type_1; type_2, type_3

我需要使用从配置文件中获取的大量查询字符串，如下所示：

my_list = ["type_1", "type_2"]
df.query("a == 2 and b in @my_list")

现在这个输出：

    a   b
1   2   type_2

但我希望输出是这样的，因为我的_列表中至少有一个b值：

    a   b
0   2   type_2
1   2   type_1; type_2
2   2   type_1; type_3
3   2   type_1; type_2, type_3

您可以看到的问题是，我的一些列实际上是列表。目前，它们是由

分隔的字符串但我可以将它们转换为列表。但是，我不确定这将如何帮助我仅使用.query（）筛选我的列表中列b
中至少有一个值的行（因为否则我将不得不解析查询字符串，它会变得混乱）
这将是具有列表的等效代码：
pd.DataFrame([[1, ["type_1"]], [2, ["type_2"]], [2, ["type_1", "type_2"]], [2, ["type_1", "type_3"]], [2, "type_3"], [2, ["type_1", "type_2", "type_3"]]],
                     columns=["a", "b"])

事实上，我错了。看起来“python”引擎支持这一点

（旧答案）您的查询可以分为两部分：需要子字符串检查的部分和其他所有部分
可以分别计算两个遮罩。我建议使用str.contains
和DataFrame.eval
。然后，您可以选择和遮罩以及过滤器df

m1 = df.eval("a == 2")
m2 = df['b'].str.contains('|'.join(my_list))

df[m1 & m2]

   a                       b
1  2                  type_2
2  2          type_1; type_2
3  2          type_1; type_3
5  2  type_1; type_2, type_3

在重新创建类似列表的列之前，您可以使用str.split
，然后使用isin
和any
。注意isin
是完全匹配的，这意味着如果您使用键入_11
，则使用isin
将返回False

df[(pd.DataFrame(df.b.str.split(';').tolist()).isin(my_list).any(1))&(df.a==2)]
Out[88]: 
   a                       b
1  2                  type_2
2  2          type_1; type_2
3  2          type_1; type_3
5  2  type_1; type_2, type_3

@coldspeed没有解决方案，即使我的b列的值是列表？不带查询。我建议使用str.contains和掩码分别进行检查。你有兴趣看看这是怎么做到的吗？@coldspeed是的。@coldspeed你知道熊猫公司是否经常为这样做的公关开放（如果你向图书馆捐款的话）我错了；这是支持的。检查我的编辑。你也应该考虑“a==2”的条件。哦，我的天啊，太棒了，我开始失去希望：）@ClaudiuCreanga今天学到了一些新东西，谢谢你的提问：）
df[(pd.DataFrame(df.b.str.split(';').tolist()).isin(my_list).any(1))&(df.a==2)]
Out[88]: 
   a                       b
1  2                  type_2
2  2          type_1; type_2
3  2          type_1; type_3
5  2  type_1; type_2, type_3