Python 如何通过字符串值和行中匹配的整数过滤数据帧？_Python_Pandas_Dataframe

Python 如何通过字符串值和行中匹配的整数过滤数据帧？

python pandas dataframe

Python 如何通过字符串值和行中匹配的整数过滤数据帧？,python,pandas,dataframe,Python,Pandas,Dataframe,谢谢你的帮助。我对熊猫还比较陌生，在搜索结果中没有观察到这种特殊的查询我有一个数据框： +-----+---------+----------+ | id | value | match_id | +-----+---------+----------+ | A10 | grass | 1 | | B45 | cow | 3 | | B98 | bird | 6 | | B17 | grass | 1 | |

谢谢你的帮助。我对熊猫还比较陌生，在搜索结果中没有观察到这种特殊的查询

我有一个数据框：

+-----+---------+----------+
| id  |  value  | match_id |
+-----+---------+----------+
| A10 | grass   |        1 |
| B45 | cow     |        3 |
| B98 | bird    |        6 |
| B17 | grass   |        1 |
| A20 | tree    |        2 |
| A87 | farmer  |        5 |
| B11 | grass   |        1 |
| A33 | chicken |        4 |
| B56 | tree    |        2 |
| A23 | farmer  |        5 |
| B65 | cow     |        3 |
+-----+---------+----------+

我需要筛选此数据帧中包含匹配

match\u id

值的行，条件是

id

列必须同时包含字符串

和B
这是预期输出：

+-----+-------+----------+ | id | value | match_id | +-----+-------+----------+ | A10 | grass | 1 | | B17 | grass | 1 | | A20 | tree | 2 | | B11 | grass | 1 | | B56 | tree | 2 | +-----+-------+----------+
比如说，我如何在一行代码中做到这一点？可复制程序如下：

import pandas as pd data_example = {'id': ['A10', 'B45', 'B98', 'B17', 'A20', 'A87', 'B11', 'A33', 'B56', 'A23', 'B65'], 'value': ['grass', 'cow', 'bird', 'grass', 'tree', 'farmer', 'grass', 'chicken', 'tree', 'farmer', 'cow'], 'match_id': [1, 3, 6, 1, 2, 5, 1, 4, 2, 5, 3]} df_example = pd.DataFrame(data=data_example) data_expected = {'id': ['A10', 'B17', 'A20', 'B11', 'B56'], 'value': ['grass', 'grass', 'tree', 'grass', 'tree'], 'match_id': [1, 1, 2, 1, 2]} df_expected = pd.DataFrame(data=data_expected)

谢谢大家!
单行似乎很难，但您可以
str.从id中提取所需的两个字符串，然后groupby 匹配\u id并使用any 查看每个匹配\u id是否至少有一行包含所需的字符串，然后使用轴1的all 将给出匹配这两个字符串的True 。然后，您可以使用刚刚创建的序列在map match\u id列之后仅选择True match\u id s = df_example['id'].str.extract('(A)|(B)').notna()\ .groupby(df_example['match_id']).any().all(1) df_expected = df_example.loc[df_example['match_id'].map(s), :] print (df_expected) id value match_id 0 A10 grass 1 3 B17 grass 1 4 A20 tree 2 6 B11 grass 1 8 B56 tree 2 对@Ben.T解决方案的不同理解： #create a helper column that combines the letters per gropu res = (df_example #the id column starts with a letter .assign(letter = lambda x: x.id.str[0]) .groupby('match_id') .letter.transform(','.join) ) df['grp'] = res df id value match_id grp 0 A10 grass 1 A,B,B 1 B45 cow 3 B,B 2 B98 bird 6 B 3 B17 grass 1 A,B,B 4 A20 tree 2 A,B 5 A87 farmer 5 A,A 6 B11 grass 1 A,B,B 7 A33 chicken 4 A 8 B56 tree 2 A,B 9 A23 farmer 5 A,A 10 B65 cow 3 B,B #filter for grps that contain A and B, and keep only relevant columns df.loc[df.grp.str.contains('A,B'), "id":"match_id"] id value match_id 0 A10 grass 1 3 B17 grass 1 4 A20 tree 2 6 B11 grass 1 8 B56 tree 2 #or u could use a list comprehension that assures u of both A and B (not just A following B) filtered = [True if ("A" in ent) and ("B" in ent) else False for ent in df.grp.array] df.loc[filtered,"id":"match_id"] id value match_id 0 A10 grass 1 3 B17 grass 1 4 A20 tree 2 6 B11 grass 1 8 B56 tree 2 这是一个提出得很好的问题。感谢您花时间以可运行的格式整理日期和示例。第二部分相当简单，但第一部分比较棘手。备用。为什么带有B56，tree，2 的行包含在最终输出中？虽然ID包含B，但它不包含2@PaulH谢谢，我的意思是按两个条件过滤：1.）按match_id 列中具有匹配整数的行过滤，2.）按id 列中包含字符串值的行过滤，每个匹配match_id 行都包含A 和B 。这有用吗？不太有用。您是说对于由match\u id 定义的每个组，至少有1个id 以“A”开头，并且至少有一个以“B”开头的id 需要存在？例如，由match\u id==3 定义的组在id 列中只有以“B”开头的值，那么这个群体被排除在外了？