在匹配的python正则表达式上提取数据

在匹配的python正则表达式上提取数据,python,regex,Python,Regex,我有以下多行字符串: /*dummy comment */ /* comment about sum function jkhkdhfljkldjf kjsdkjflskj */ int sum(int a,int b); /*comment about mul function */ int mul(int a,int b); 如果使用以下正则表达式,则会得到两个输出匹配: regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P&l

我有以下多行字符串:

/*dummy comment */

/* comment about sum function jkhkdhfljkldjf
  kjsdkjflskj
*/

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);
如果使用以下正则表达式,则会得到两个输出匹配:

regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P<fun>\s*int\s*\b\w+\b\s*\(\w+\s+.+\s*(?:;$))
匹配#2:

对于match#1,我得到两条注释,但我只需要最后一条注释,即,/*关于求和函数jkhkdhfljkldjf kjsdkjflskj*/我不想与/*伪注释*匹配

请帮助我获得以下输出

匹配#1:

匹配#2:


我无法调试您的正则表达式,因为它在示例中的格式似乎不正确,所以这里有一个关于如何调试的工作片段。在注释解释正则表达式如何工作时,请仔细查看注释

import re

# sample text as in the question
sample_str = """/*dummy comment */

/* comment about sum function */

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);"""

# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
#    Match the character “/” literally «/»
#    Match the character “*” literally «\*»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the character string “comment about ” literally (case sensitive) «comment about »
#    Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character string “ function ” literally (case sensitive) « function »
#    Match the character “*” literally «\*»
#    Match the character “/” literally «/»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the carriage return character «\r*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the line feed character «\n*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
#    Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character “ ” literally « »
#    Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the opening parenthesis character «\(»
#    Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
#       Match any character that is NOT the closing parenthesis character «[^\)]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the closing parenthesis character «\)»

function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")

for function_match in function_re.finditer(sample_str):
    # match start: function_match.start()
    # match end (exclusive): function_match.end()
    # matched text: function_match.group()
    print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
    print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
    # Additional groups if you need them
    # print("Func Name 1: {}".format(function_match.group("func_name")))
    # print("Func Name 2: {}".format(function_match.group("func_name_2")))
    # print("Arguments  : {}".format(function_match.group("arguments")))

我试图编辑这篇文章,但没有成功。你能格式化你的代码吗?嗨,Lorenzo Persichetti,谢谢你的重播,但是注释可以是/*jkhdfhs dfhkjs*/或者它也可以是多行的,我们能得到函数声明上方的唯一一条注释吗?即使注释可以是单行或多行的,但它应该以/*开头,以/*结尾*/
desc:

/* comment about mul function */

fun:

 int mul(int a,int b);
desc:

/* commect about sum function jkhkdhfljkldjf
  kjsdkjflskj*/

fun:

int sum(int a,int b);
desc:

/* comment about mul function */

fun:

 int mul(int a,int b);
import re

# sample text as in the question
sample_str = """/*dummy comment */

/* comment about sum function */

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);"""

# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
#    Match the character “/” literally «/»
#    Match the character “*” literally «\*»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the character string “comment about ” literally (case sensitive) «comment about »
#    Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character string “ function ” literally (case sensitive) « function »
#    Match the character “*” literally «\*»
#    Match the character “/” literally «/»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the carriage return character «\r*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the line feed character «\n*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
#    Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character “ ” literally « »
#    Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the opening parenthesis character «\(»
#    Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
#       Match any character that is NOT the closing parenthesis character «[^\)]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the closing parenthesis character «\)»

function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")

for function_match in function_re.finditer(sample_str):
    # match start: function_match.start()
    # match end (exclusive): function_match.end()
    # matched text: function_match.group()
    print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
    print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
    # Additional groups if you need them
    # print("Func Name 1: {}".format(function_match.group("func_name")))
    # print("Func Name 2: {}".format(function_match.group("func_name_2")))
    # print("Arguments  : {}".format(function_match.group("arguments")))
desc:

/* comment about sum function */

fun:

int sum(int a,int b)

--------

desc:

/*comment about mul function */

fun:

int mul(int a,int b)

--------