在python中的某个模式之前，在字符串中添加分隔符_Python_Regex

在python中的某个模式之前，在字符串中添加分隔符

python regex

在python中的某个模式之前，在字符串中添加分隔符,python,regex,Python,Regex,我有一些单词的列表 [“区域”、“建筑”、“街道号”、“楼层”] 如果在字符串中，下列单词中的任何一个后面有冒号（：），我需要在该单词之前添加一个分隔符（最好是逗号）。例如： sample\u input=“区域：al mansorah街30号建筑：xyz塔楼：第三层” expected_output=“区域：al mansorah，街道30号，建筑：xyz大厦，楼层：3层” 这是我当前的实现： sentence= "area : al mansorah street no

我有一些单词的列表

[“区域”、“建筑”、“街道号”、“楼层”]

如果在字符串中，下列单词中的任何一个后面有冒号（：），我需要在该单词之前添加一个分隔符（最好是逗号）。例如：

sample\u input=“区域：al mansorah街30号建筑：xyz塔楼：第三层”

expected_output=“区域：al mansorah，街道30号，建筑：xyz大厦，楼层：3层”

这是我当前的实现：

        sentence= "area : al mansorah street no    : 30 building : 6 floor : 3rd"
        words = ["area", "building", "street no", "floor"]
        for x in words:
            regex = re.escape(x) + r"\s+:"
            rep_str = ", " + x + ":"
            sentence = re.sub(regex, rep_str, sentence)

这是可行的，但效率很低，因为我有数百个这样的单词要检查。它也不包括边缘情况，比如如果它是第一个单词，就不要添加分隔符，如果它已经存在，就不要添加分隔符。

任何帮助都将不胜感激。

您可能正在寻找的正则表达式是

（[^，\s]）（\s+（？：您的| words | here）\s*：）

，因为它非常适合python并且可以动态增长。您可以使用for循环来构建数百字长的正则表达式，然后运行一次，而不是使用for循环来运行数百次这个正则表达式

```
（[^\s，]）
```
捕获一个非逗号、非空白字符-如果已经有一个逗号，或者这是行中的第一个单词，它将被忽略
```
（\s+（？：您的|单词|此处）\s*：）
```
捕获一个或多个空白字符，后跟列表中的任何单词，并以冒号结尾

#字符串的第一部分
rex_str=“（[^，\s]）（\s+（？：”
#第一个字
rex_str+=单词[0]
#将剩下的单词放入非捕获组
对于范围（1，len（单词））中的i：
rex_str+=“|”
rex_str+=单词[i]
#关闭正则表达式
rex_str+=”）\s*：）
#在第一个和第二个捕获组之间添加逗号
句子=re.sub（rex_str，“\g，\g”，句子）

用下面这句话，你几乎可以得到你想要的：

sentence= "area : al mansorah street no    : 30 building : 6 floor : 3rd"
words = ["area", "building", "street no", "floor"]

sentence = re.sub(r"(?<!^)\s*({})\s*:".format('|'.join(words)), ", \\1:", sentence)

sentence
'area : al mansorah, street no: 30, building: 6, floor: 3rd'

句子=“区域：al mansorah街30号建筑：6层：3层”
词语=[“区域”、“建筑”、“街道号”、“楼层”]
你能解释一下到底是什么吗（？在这里做什么？它是一个空白边界。匹配前的空白应该存在，或者字符串的开头。感谢您的详细回答，它正在工作。但它不包括分隔符已经存在的情况。如果存在分隔符，它应该添加分隔符。请共享一个失败的示例。：）这将更容易提供帮助。谢谢！“区域：al mansorah，街号30，楼号6，楼号3”
此操作将在此失败，因为它将添加已经存在的额外分隔符。预期输出：“区域：al mansorah，街号30，楼号6，楼号3”“
Cool。我编辑了答案并更改了正则表达式，以便在分隔符已经存在的情况下不会失败。感谢您提供详细的答案。但是，只有在给定的单词列表后面有冒号时，我才需要添加分隔符。这是添加逗号，即使单词后面没有冒号。“area:al mansorah street no:30 building 6 floor:3rd”
此类案例在这里失败，因为我在构建之前不需要逗号。这样做很好，但唯一的问题是它无法保持格式。“area:al mansorah street no:30 building 6 floor:3rd”
这应该像一样格式化。”区域：al mansorah，30号街，6号楼，3楼“其他一切都很好，因此我接受这个答案。感谢您添加Regex演示，因为它确实帮助我完全理解Regex。
# format words when they are the first word in the sentence
sentence = re.sub(r"^\s*({})\s*:".format('|'.join(words)), "\\1:", sentence)

# format words when they are not the first word in the sentence
sentence = re.sub(r"(?<!^)\s*({})\s*:".format('|'.join(words)), ", \\1:", sentence)

sentence = "area : al mansorah street no : 30, building : 6, floor : 3rd"
words = ["area", "building", "street no", "floor"]

# format words when they are the first word in the sentence
sentence = re.sub(r"^[\s,]*({})\s*:".format('|'.join(words)), "\\1:", sentence)

# format words when they are not the first word in the sentence
sentence = re.sub(r"(?<!^)[\s,]*({})\s*:".format('|'.join(words)), ", \\1:", sentence)

sentence
'area: al mansorah, street no: 30, building: 6, floor: 3rd'