Python 正则表达式函数删除除指定表达式以外的所有数字和标点符号？_Python_Regex_String

Python 正则表达式函数删除除指定表达式以外的所有数字和标点符号？

python regex string

Python 正则表达式函数删除除指定表达式以外的所有数字和标点符号？,python,regex,string,Python,Regex,String,我有一个Python字符串： string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section F.3.2 we 23 covered Z." 我需要从文本中删除随机数和标点符号，以便： “21！字符串”-->“…字符串…”和 “覆盖1主题x.”--------

我有一个Python字符串：

string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section F.3.2 we 23 covered Z."

我需要从文本中删除随机数和标点符号，以便：

“21！字符串”-->“…字符串…”和

“覆盖1主题x.”----------->“覆盖主题”

我的最后一个字符串应该是：

filtered = "hello i am a string in section 3.2.F.1.2 we covered topic x on the other hand in section 1.2.F.1.1 we covered y lastly in section 1.1.F.3.2 we covered z"

…使代码“3.2.F.1.2”、“1.2.F.1.1”和“1.1.F.3.2”不受此影响

我能够生成一个正则表达式来指定以下代码：

regex_codes = "[\d\.]{1,4}F[\.\d]{1,4}"

all_nums_punct = "[0-9 _.,!"'/$]*"

我搞不懂的是如何“选择并删除除这些代码（regex代码）模式之外的所有数字和标点符号（all_nums_punct）”

我尝试使用“负前瞻”模式来忽略从上一个代码开始的所有内容，但我的选择没有选择任何内容。

使用PyPI存储库中的

regex

包：

import regex

string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section 1.1.F.3.2 we 23 covered Z."
string = regex.sub(r'''[\d\.]{1,4}F[\.\d]{1,4}(*SKIP)(*FAIL)|[0-9_.,!"'/$]''', '', string)
print(string)

印刷品：

Hello I am a  string In section F we covered topic X On the other hand in section F we covered Y Lastly in section F we  covered Z

Hello I am a string In section 3.2.F.1.2 we covered topic X On the other hand in section 1.2.F.1.1 we covered Y Lastly in section 1.1.F.3.2 we covered Z

我们匹配您的

regex\u代码

表达式或一个

all\u nums\u punt

字符（不带空格字符）。如果我们匹配

regex_代码

表达式，我们将跳过这些字符并通过测试，然后尝试第二种选择

结果可能有多个连续的空格字符。您需要执行第二次替换操作，以使用单个空间替换这些空间：

import regex

string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section 1.1.F.3.2 we 23 covered Z."
string = regex.sub(r'''[\d\.]{1,4}F[\.\d]{1,4}(*SKIP)(*FAIL)|[0-9_.,!"'/$]''', '', string)
string = regex.sub(r' +', ' ', string)
print(string)

印刷品：

Hello I am a  string In section F we covered topic X On the other hand in section F we covered Y Lastly in section F we  covered Z

Hello I am a string In section 3.2.F.1.2 we covered topic X On the other hand in section 1.2.F.1.1 we covered Y Lastly in section 1.1.F.3.2 we covered Z

更新

我将尝试回答您向@WiktorStribiżew提出的关于以下解决方案如何工作的问题：

re.sub(r"""([.\d]{1,4}F[.\d]{1,4})|[0-9_.,!"'/$]'""", '\1', $string)

正则表达式匹配的任何内容都将替换为

'\1'

，它指定捕获组1的值。如果正则表达式匹配一个

regex_code

，则捕获组1将被设置为它匹配的任何一个，匹配的字符串将被替换为自身，并且不会修改任何内容。但是，如果正则表达式与您希望删除的字符之一匹配，则捕获组1将为空，并且匹配的字符串将替换为空字符串。此方法不需要

regex

包。这个方法也会留下连续的空格，正如我所指出的，您可能希望删除这些空格。

使用PyPI存储库中的

regex

包：

import regex

string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section 1.1.F.3.2 we 23 covered Z."
string = regex.sub(r'''[\d\.]{1,4}F[\.\d]{1,4}(*SKIP)(*FAIL)|[0-9_.,!"'/$]''', '', string)
print(string)

印刷品：

Hello I am a  string In section F we covered topic X On the other hand in section F we covered Y Lastly in section F we  covered Z

Hello I am a string In section 3.2.F.1.2 we covered topic X On the other hand in section 1.2.F.1.1 we covered Y Lastly in section 1.1.F.3.2 we covered Z

我们匹配您的

regex\u代码

表达式或一个

all\u nums\u punt

字符（不带空格字符）。如果我们匹配

regex_代码

表达式，我们将跳过这些字符并通过测试，然后尝试第二种选择

结果可能有多个连续的空格字符。您需要执行第二次替换操作，以使用单个空间替换这些空间：

import regex

string = "Hello I am a 21 !string. In section 3.2.F.1.2 we covered 1topic X. On the oth1er hand, in section 1.2.F.1.1 we covered Y. Lastly, in section 1.1.F.3.2 we 23 covered Z."
string = regex.sub(r'''[\d\.]{1,4}F[\.\d]{1,4}(*SKIP)(*FAIL)|[0-9_.,!"'/$]''', '', string)
string = regex.sub(r' +', ' ', string)
print(string)

印刷品：

Hello I am a  string In section F we covered topic X On the other hand in section F we covered Y Lastly in section F we  covered Z

Hello I am a string In section 3.2.F.1.2 we covered topic X On the other hand in section 1.2.F.1.1 we covered Y Lastly in section 1.1.F.3.2 we covered Z

更新

我将尝试回答您向@WiktorStribiżew提出的关于以下解决方案如何工作的问题：

re.sub(r"""([.\d]{1,4}F[.\d]{1,4})|[0-9_.,!"'/$]'""", '\1', $string)

正则表达式匹配的任何内容都将替换为

'\1'

，它指定捕获组1的值。如果正则表达式匹配一个

regex_code

regex

包。这种方法也会留下连续的空格，正如我所指出的，您可能希望删除这些空格。

那么您想删除所有独立的数字吗？那么你就不能用像“你好，我21岁了”这样的句子了。

所有的单词都包含空格字符。确实要清除所有空格吗？如果正则表达式有效，可以使用re.sub（r'（[.\d]{1,4}F[.\d]{1,4}）|[0-9]，！“/$”，r'\1'，text）
@JohnGordon这是正确的，是的！@WiktorStribiżew你能解释一下这里发生了什么吗？我知道你在用另一个表达式替换这个表达式。我不明白的是如何使用|分隔这两个表达式，以及如何使用r'\1'。那么你想删除所有独立的数字吗？那么你就不能使用像你好，我21岁了。

所有的nums\u punt都包含一个空格字符。你确定要去掉所有空格吗？好的，如果你的regexp有效，你可以使用re.sub（r'（[.\d]{1,4}F[.\d]{1,4}）|[0-9'，！“/$]”，r'\1'，text）
@JohnGordon没错，是的@你能解释一下这里发生了什么吗？我知道你在用另一个表达式替换这个表达式。我不明白的是|分隔二者的用法，以及r'\1的用法。谢谢！你能解释一下（*跳过）（*失败）功能吗？后端发生了什么？这个问题已经被问到并回答了。我也会更新我的答案，并尝试解释@WiktorStribiżew提出的解决方案，尽管我相信他可以更好地表达出来（解释可能太长，无法发表评论）。谢谢！你能解释一下（*跳过）（*失败）功能吗？后端发生了什么？这个问题已经被问到并回答了。我也会更新我的答案，并尝试解释@WiktorStribiżew提出的解决方案，尽管我相信他可以更好地表达出来（解释可能太长，无法发表评论）。