在匹配的python正则表达式上提取数据_Python_Regex

在匹配的python正则表达式上提取数据

python regex

在匹配的python正则表达式上提取数据,python,regex,Python,Regex,我有以下多行字符串： /*dummy comment */ /* comment about sum function jkhkdhfljkldjf kjsdkjflskj */ int sum(int a,int b); /*comment about mul function */ int mul(int a,int b); 如果使用以下正则表达式，则会得到两个输出匹配： regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P&l

我有以下多行字符串：

/*dummy comment */

/* comment about sum function jkhkdhfljkldjf
  kjsdkjflskj
*/

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);

如果使用以下正则表达式，则会得到两个输出匹配：

regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P<fun>\s*int\s*\b\w+\b\s*\(\w+\s+.+\s*(?:;$))

匹配#2：

对于match#1，我得到两条注释，但我只需要最后一条注释，即，/*关于求和函数jkhkdhfljkldjf kjsdkjflskj*/我不想与/*伪注释*匹配

请帮助我获得以下输出

匹配#1：

匹配#2：

我无法调试您的正则表达式，因为它在示例中的格式似乎不正确，所以这里有一个关于如何调试的工作片段。在注释解释正则表达式如何工作时，请仔细查看注释

import re

# sample text as in the question
sample_str = """/*dummy comment */

/* comment about sum function */

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);"""

# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
#    Match the character “/” literally «/»
#    Match the character “*” literally «\*»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the character string “comment about ” literally (case sensitive) «comment about »
#    Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character string “ function ” literally (case sensitive) « function »
#    Match the character “*” literally «\*»
#    Match the character “/” literally «/»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the carriage return character «\r*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the line feed character «\n*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
#    Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character “ ” literally « »
#    Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the opening parenthesis character «\(»
#    Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
#       Match any character that is NOT the closing parenthesis character «[^\)]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the closing parenthesis character «\)»

function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")

for function_match in function_re.finditer(sample_str):
    # match start: function_match.start()
    # match end (exclusive): function_match.end()
    # matched text: function_match.group()
    print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
    print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
    # Additional groups if you need them
    # print("Func Name 1: {}".format(function_match.group("func_name")))
    # print("Func Name 2: {}".format(function_match.group("func_name_2")))
    # print("Arguments  : {}".format(function_match.group("arguments")))

我试图编辑这篇文章，但没有成功。你能格式化你的代码吗？嗨，Lorenzo Persichetti，谢谢你的重播，但是注释可以是/*jkhdfhs dfhkjs*/或者它也可以是多行的，我们能得到函数声明上方的唯一一条注释吗？即使注释可以是单行或多行的，但它应该以/*开头，以/*结尾*/

desc:

/* comment about mul function */

fun:

 int mul(int a,int b);

desc:

/* commect about sum function jkhkdhfljkldjf
  kjsdkjflskj*/

fun:

int sum(int a,int b);

desc:

/* comment about mul function */

fun:

 int mul(int a,int b);

import re

# sample text as in the question
sample_str = """/*dummy comment */

/* comment about sum function */

int sum(int a,int b);

/*comment about mul function */ 

int mul(int a,int b);"""

# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
#    Match the character “/” literally «/»
#    Match the character “*” literally «\*»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the character string “comment about ” literally (case sensitive) «comment about »
#    Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character string “ function ” literally (case sensitive) « function »
#    Match the character “*” literally «\*»
#    Match the character “/” literally «/»
#    Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the carriage return character «\r*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#    Match the line feed character «\n*»
#       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
#    Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the character “ ” literally « »
#    Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
#       Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the opening parenthesis character «\(»
#    Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
#       Match any character that is NOT the closing parenthesis character «[^\)]+?»
#          Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
#    Match the closing parenthesis character «\)»

function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")

for function_match in function_re.finditer(sample_str):
    # match start: function_match.start()
    # match end (exclusive): function_match.end()
    # matched text: function_match.group()
    print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
    print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
    # Additional groups if you need them
    # print("Func Name 1: {}".format(function_match.group("func_name")))
    # print("Func Name 2: {}".format(function_match.group("func_name_2")))
    # print("Arguments  : {}".format(function_match.group("arguments")))

desc:

/* comment about sum function */

fun:

int sum(int a,int b)

--------

desc:

/*comment about mul function */

fun:

int mul(int a,int b)

--------