Regex 熊猫：使用正则表达式从列中选择行_Regex_Pandas

Regex 熊猫：使用正则表达式从列中选择行

regex pandas

Regex 熊猫：使用正则表达式从列中选择行,regex,pandas,Regex,Pandas,我想从列feccandid中提取第一个值为H或S的行： cid amount date catcode feccandid 0 N00031317 1000 2010 B2000 H0FL19080 1 N00027464 5000 2009 B1000 H6IA01098 2 N00024875 1000 2009 A5200 S2IL08088 3 N00030957 2000

我想从列

feccandid

中提取第一个值为H或S的行：

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

我正在使用以下代码：

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

返回错误：

ValueError:模式不包含捕获组

有使用正则表达式经验的人知道我做错了什么吗？

对于这么简单的事情，您可以绕过正则表达式：

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

但是，如果要使用正则表达式，可以将其更改为

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

请注意，

astype

是冗余的，

extract

就足够了。

为什么不直接使用

str.match

而不是extract和negate

df[df['col'].str.match（r'^（S|H）]

（我来这里寻找同样的答案，但是使用摘录似乎很奇怪，所以我找到了

str.ops

的文档

虽然这两个答案都很实用，但这是一个更好的解决方案。