如何使用正则表达式在python中提取关键字列表后面的单词？_Python_Regex

如何使用正则表达式在python中提取关键字列表后面的单词？

python regex

如何使用正则表达式在python中提取关键字列表后面的单词？,python,regex,Python,Regex,我试图在python中使用正则表达式提取位置。现在，我正在这样做： def get_location(s): s = s.strip(STRIP_CHARS) keywords = "at|outside|near" location_pattern = "(?P<location>((?P<place>{keywords}\s[A-Za-z]+)))".format(keywords = keywords) location_regex

我试图在python中使用正则表达式提取位置。现在，我正在这样做：

def get_location(s):
    s = s.strip(STRIP_CHARS)
    keywords = "at|outside|near"
    location_pattern = "(?P<location>((?P<place>{keywords}\s[A-Za-z]+)))".format(keywords = keywords)
    location_regex = re.compile(location_pattern, re.IGNORECASE | re.MULTILINE | re.UNICODE | re.DOTALL | re.VERBOSE)

    for match in location_regex.finditer(s):
        match_str = match.group(0)
        indices = match.span(0)
        print ("Match", match)
        match_str = match.group(0)
        indices = match.span(0)
        print (match_str)

get_location("Im at building 3")

def get_位置：
s=s.strip（strip\u CHARS）
关键词=“在|外|近”
location_pattern=“（？P（（？P{keywords}\s[A-Za-z]+））”。格式（keywords=keywords）
location_regex=re.compile（location_pattern，re.IGNORECASE | re.MULTILINE | re.UNICODE | re.DOTALL | re.VERBOSE）
对于位置\u regex.finditer中的匹配：
match_str=match.group（0）
索引=匹配.span（0）
打印（“匹配”，匹配）
match_str=match.group（0）
索引=匹配.span（0）
打印（匹配）
获取位置（“我在3号楼”）

我有三个问题：

它只是给出“at”作为输出，但它也应该给出建筑

captures=match.capturesdict（）

在其他示例中，我无法使用它来提取捕获

当我在| outside\s\w+上做这个

location\u pattern='时。它似乎起作用了。有人能解释我做错了什么吗


这里的主要问题是您需要将{keywords}
放在非捕获组中：（？：{keywords}）
。下面是一个示意性示例：a | b | c\s+\w+
匹配a
或b
或c
+
+。当您将替换列表放入一个组时，
（a | b | c）\s+\w+，它将匹配
a，或
b或
c`，然后它将尝试匹配空格，然后匹配单词字符
见更新代码（a）：
将regex作为re导入
def get_位置：
条状字符='*'
s=s.strip（strip\u CHARS）
关键词=“在|外|近”
location_pattern=“（？P（（？P（？{keywords}）\s+[A-Za-z]+）”。格式（keywords=keywords）
location_regex=re.compile（location_模式，re.IGNORECASE | re.UNICODE）
对于位置\u regex.finditer中的匹配：
match_str=match.group（0）
索引=匹配.span（0）
打印（“匹配”，匹配）
match_str=match.group（0）
索引=匹配.span（0）
打印（匹配）
captures=match.capturesdict（）
打印（捕获）
获取位置（“我在3号楼”）

输出：
('Match', <regex.Match object; span=(3, 14), match='at building'>)
at building
{'place': ['at building'], 'location': ['at building']}

（'Match'，）
在大厦
{'place'：['at building']，'location'：['at building']}

请注意，location\u pattern='at | outside\s\w+
不起作用，因为at
到处都匹配，outside
后面必须跟一个空格和字字符。您可以用同样的方法进行修复：（在|外部）\s\w+

如果您将关键字放入一个组中，thecaptures=match.capturesdict（）
将很好地工作（请参见上面的输出）。
您可以发布您正在搜索的文本示例。您能否在问题中添加两个字符串，以及预期输出是什么？我删除了多行
，演示中的VERBOSE
和DOTALL
修饰符，因为正则表达式不使用任何受影响的功能。谢谢，有没有办法只获取building而不是“at”？@user3667569:请参阅。用捕获括号将[A-Za-z]+
部分包裹起来，并获取相应的.group（3）
或.group（4）（在本例中，我删除了外部编号的捕获组）值。
('Match', <regex.Match object; span=(3, 14), match='at building'>)
at building
{'place': ['at building'], 'location': ['at building']}