在匹配的python正则表达式上提取数据
我有以下多行字符串:在匹配的python正则表达式上提取数据,python,regex,Python,Regex,我有以下多行字符串: /*dummy comment */ /* comment about sum function jkhkdhfljkldjf kjsdkjflskj */ int sum(int a,int b); /*comment about mul function */ int mul(int a,int b); 如果使用以下正则表达式,则会得到两个输出匹配: regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P&l
/*dummy comment */
/* comment about sum function jkhkdhfljkldjf
kjsdkjflskj
*/
int sum(int a,int b);
/*comment about mul function */
int mul(int a,int b);
如果使用以下正则表达式,则会得到两个输出匹配:
regex -> (?P<desc>(\/\*[\s\S]+?\*\/$))(?P<fun>\s*int\s*\b\w+\b\s*\(\w+\s+.+\s*(?:;$))
匹配#2:
对于match#1,我得到两条注释,但我只需要最后一条注释,即,/*关于求和函数jkhkdhfljkldjf kjsdkjflskj*/我不想与/*伪注释*匹配
请帮助我获得以下输出
匹配#1:
匹配#2:
我无法调试您的正则表达式,因为它在示例中的格式似乎不正确,所以这里有一个关于如何调试的工作片段。在注释解释正则表达式如何工作时,请仔细查看注释
import re
# sample text as in the question
sample_str = """/*dummy comment */
/* comment about sum function */
int sum(int a,int b);
/*comment about mul function */
int mul(int a,int b);"""
# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
# Match the character “/” literally «/»
# Match the character “*” literally «\*»
# Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the character string “comment about ” literally (case sensitive) «comment about »
# Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the character string “ function ” literally (case sensitive) « function »
# Match the character “*” literally «\*»
# Match the character “/” literally «/»
# Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the carriage return character «\r*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the line feed character «\n*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
# Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the character “ ” literally « »
# Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the opening parenthesis character «\(»
# Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
# Match any character that is NOT the closing parenthesis character «[^\)]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the closing parenthesis character «\)»
function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")
for function_match in function_re.finditer(sample_str):
# match start: function_match.start()
# match end (exclusive): function_match.end()
# matched text: function_match.group()
print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
# Additional groups if you need them
# print("Func Name 1: {}".format(function_match.group("func_name")))
# print("Func Name 2: {}".format(function_match.group("func_name_2")))
# print("Arguments : {}".format(function_match.group("arguments")))
我试图编辑这篇文章,但没有成功。你能格式化你的代码吗?嗨,Lorenzo Persichetti,谢谢你的重播,但是注释可以是/*jkhdfhs dfhkjs*/或者它也可以是多行的,我们能得到函数声明上方的唯一一条注释吗?即使注释可以是单行或多行的,但它应该以/*开头,以/*结尾*/
desc:
/* comment about mul function */
fun:
int mul(int a,int b);
desc:
/* commect about sum function jkhkdhfljkldjf
kjsdkjflskj*/
fun:
int sum(int a,int b);
desc:
/* comment about mul function */
fun:
int mul(int a,int b);
import re
# sample text as in the question
sample_str = """/*dummy comment */
/* comment about sum function */
int sum(int a,int b);
/*comment about mul function */
int mul(int a,int b);"""
# Match the regex below and capture its match into a backreference named “desc” (also backreference number 1) «(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/\s*\r*\n*)»
# Match the character “/” literally «/»
# Match the character “*” literally «\*»
# Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the character string “comment about ” literally (case sensitive) «comment about »
# Match the regex below and capture its match into a backreference named “func_name” (also backreference number 2) «(?P<func_name>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the character string “ function ” literally (case sensitive) « function »
# Match the character “*” literally «\*»
# Match the character “/” literally «/»
# Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the carriage return character «\r*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the line feed character «\n*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regex below and capture its match into a backreference named “fun” (also backreference number 3) «(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))»
# Match the regex below and capture its match into a backreference named “return_type” (also backreference number 4) «(?P<return_type>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the character “ ” literally « »
# Match the regex below and capture its match into a backreference named “func_name_2” (also backreference number 5) «(?P<func_name_2>[^\s]+?)»
# Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «[^\s]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the opening parenthesis character «\(»
# Match the regex below and capture its match into a backreference named “arguments” (also backreference number 6) «(?P<arguments>[^\)]+?)»
# Match any character that is NOT the closing parenthesis character «[^\)]+?»
# Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
# Match the closing parenthesis character «\)»
function_re = re.compile(r"(?P<desc>/\*\s*comment about (?P<func_name>[^\s]+?) function \*/)\s*\r*\n*(?P<fun>(?P<return_type>[^\s]+?) (?P<func_name_2>[^\s]+?)\((?P<arguments>[^\)]+?)\))")
for function_match in function_re.finditer(sample_str):
# match start: function_match.start()
# match end (exclusive): function_match.end()
# matched text: function_match.group()
print("\ndesc:\n\n{}\n".format(function_match.group("desc")))
print("fun:\n\n{}\n\n--------".format(function_match.group("fun")))
# Additional groups if you need them
# print("Func Name 1: {}".format(function_match.group("func_name")))
# print("Func Name 2: {}".format(function_match.group("func_name_2")))
# print("Arguments : {}".format(function_match.group("arguments")))
desc:
/* comment about sum function */
fun:
int sum(int a,int b)
--------
desc:
/*comment about mul function */
fun:
int mul(int a,int b)
--------