Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中编写正则表达式以查找多个字符串_Python_Regex - Fatal编程技术网

如何在Python中编写正则表达式以查找多个字符串

如何在Python中编写正则表达式以查找多个字符串,python,regex,Python,Regex,例如,我有一个字符串,如 "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . " “look[+3]”表示该句子涉及某个项目的某个方面,[+3]表示该项目是一个得分为3的积极评价。(这实际上来自亚马逊评论数据集。) 我想把它分成两份 X: "it 's

例如,我有一个字符串,如

"look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
“look[+3]”表示该句子涉及某个项目的某个方面,[+3]表示该项目是一个得分为3的积极评价。(这实际上来自亚马逊评论数据集。)

我想把它分成两份

X: "it 's very sleek looking with a very good front panel button layout , and it has a great feature set ."

Y: [("look", 3), ("panel button layout", 3), ("feature", 2)]

一个选项是捕获字符串或逗号开头之后的所有内容,直到
[
,并提取
[+
后面的数字:

>>> import re
>>> s = "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('look', '3'), ('panel button layout', '3'), ('feature', '2')]
>>>
>>> s = "darn diopter adjustment dial[-1]"
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)                                                            
[('darn diopter adjustment dial', '-1')]
其中:

  • (?:^ |,)
    是一个与字符串开头或逗号匹配的非捕获组
  • (.*?)
    是任意字符任意次数的非贪婪匹配
  • \[\+?(\-?\d+\]
    将匹配一个开头的
    [
    ,后面是一个可选的
    +
    ,后面是一个捕获一个或多个数字的捕获组(开头是一个可选的
    -
    ),然后是一个结束的
    ]
    • 您的key/val对以$1表示,摘要以$2表示

      编辑:虽然
      re
      不支持每组的多个匹配项(只有最后一次捕获可用),但:


      你可以使用
      re.findall('(.*\[\+\d+\],?)',s)
      来获得你想要的
      Y
      输出。

      嗯,难道不可能更恰当地刮取评论,这样你就不需要正则表达式来提取信息吗?你确定
      功能[+2]
      之后没有
      ?我想
      re.finditer()回答问题的另一部分(
      X:“它是…”)会更好,但
      re.sub()`可能也会同样有效。
      darn屈光度调整刻度盘[-1]##我之所以给它评级为4是因为那该死的屈光度调整刻度盘。它非常小,很难转动,所以你无法得到准确的调整(对于那些不知道什么是屈光度调整的人来说,就是将取景器的焦点调整到你的视力上。)
      []
      我发现它在这种情况下不起作用。那
      ##
      是从哪里来的?你可以忽略它。@Vicky请现在检查更新。它处理数字之前的+和-。你看过你发布的链接了吗?表达式只找到最后一个键。我得到
      (',feature[+2],“它外观非常圆滑,前面板按钮布局非常好,而且功能强大。”)
      。我做得不对吗?我已经尝试了几个
      re
      函数,但我似乎无法用regex获得正确的输出。python支持那些匹配组,在线工具不支持-它可以工作。然后请发布正确的调用以提取
      [(“外观”,3),(“面板按钮布局”,3),(“功能”,2)]
      ([^\]]+[[^\]])+(.*)
      
      >>> m = regex.search(r"([^\]]+[[^\]])+(.*)", "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")
      >>> m.group(1)
      ',feature[+2]'
      >>> m.captures(1)
      ['look[+3]', ',panel button layout[+3]', ',feature[+2]']
      >>> m.group(2)
      "it's very sleek looking with a very good front panel button layout , and it has a great feature set . "