Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/355.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 关于S类和P类的findall错误逃逸_Python_Regex_Ascii_Findall - Fatal编程技术网

Python 关于S类和P类的findall错误逃逸

Python 关于S类和P类的findall错误逃逸,python,regex,ascii,findall,Python,Regex,Ascii,Findall,我试图从字符串中删除所有标点符号和特殊字符,包括数字,但出现错误:error:bad escape\p位于位置2 这是否意味着python的正则表达式不能识别\p{s}和\p{p} 代码是: name = "URL-dsds diasa:dksdjsk dskdjs_dskjdks 23232 dsds32 dskdjskds&dsjdsjdhs fddjfd%djshdhjs kdjs¤dskjds öfdfdjfkdj" re.findall(r'[^\p{P}\p{S}\s\d]+

我试图从字符串中删除所有标点符号和特殊字符,包括数字,但出现错误:
error:bad escape\p位于位置2

这是否意味着python的正则表达式不能识别
\p{s}
\p{p}

代码是:

name = "URL-dsds diasa:dksdjsk dskdjs_dskjdks 23232 dsds32 dskdjskds&dsjdsjdhs fddjfd%djshdhjs kdjs¤dskjds öfdfdjfkdj"
re.findall(r'[^\p{P}\p{S}\s\d]+', name.lower())
我希望输出与regex101突出显示的相同:

有什么帮助吗?

是的,很遗憾是这样

查看regex101.com 将样式更改为Python,并将正则表达式粘贴到顶部的字段中:

在右侧为您提供以下信息:

[^\p{P}\p{S}\s\d]+

gm <Python>
Match a single character not present in the list below [^\p{P}\p{S}\s\d]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\p matches the character p literally (case sensitive) <<<<<<<<<<<<<<<<<<<<<<<<<<<<
{P} matches a single character in the list {P} (case sensitive)<<<<<<<<<<<<<<<<<<
\p matches the character p literally (case sensitive)
{S} matches a single character in the list {S} (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d matches a digit (equal to [0-9])
[^\p{p}\p{S}\S\d]+
转基因的
匹配以下列表中不存在的单个字符[^\p{p}\p{S}\S\d]+
+量词-在一次和无限次之间匹配,尽可能多地匹配,根据需要返回(贪婪)

\p按字面意思匹配字符p(区分大小写)我在@WiktorStribiżew comment之后使用了PyPi regex,因为它支持Unicode类别类。所以我就这么做了:

pip install regex
import regex as re
name = "URL-dsds diasa:dksdjsk dskdjs_dskjdks 23232 dsds32 dskdjskds&dsjdsjdhs fddjfd%djshdhjs kdjs¤dskjds öfdfdjfkdj"
re.findall(r'[^\p{P}\p{S}\s\d]+', name.lower())
我得到输出:

['url','dsds','DISA','dksdjsk','dskdjs','dskjdks','dsds', “dskdjskds”、“DSJDSJDH”、“fddjfd”、“djshdhjs”、“kdjs”、“dskjds”, 'öfdjfkdj']


使用PyPi
regex
模块可以使用Unicode类别类。或者,由于您只需要匹配字母,只需使用
r'[^\W\d\]+'
,请查看与以下相同的问题:不同的@ZenZac,请查看Wiktor提出的问题和建议的解决方案,与您共享的链接完全不同。使用正确的线程关闭。