注释代码中的Python正则表达式

注释代码中的Python正则表达式,python,regex,Python,Regex,我试图在大多数文件开头的注释代码中匹配开源许可证类型。但是,对于所需字符串(例如,较小的通用公共许可证)跨越两行的情况,我有困难。例如,请参见下面的代码许可证 * Copyright (c) Codice Foundation * <p/> * This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser * General Public Li

我试图在大多数文件开头的注释代码中匹配开源许可证类型。但是,对于所需字符串(例如,较小的通用公共许可证)跨越两行的情况,我有困难。例如,请参见下面的代码许可证

 * Copyright (c) Codice Foundation
 * <p/>
 * This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser
 * General Public License as published by the Free Software Foundation, either version 3 of the
 * License, or any later version.
 * <p/>
 * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * Lesser General Public License for more details. A copy of the GNU Lesser General Public License
 * is distributed along with this program and can be found at
 * <http://www.gnu.org/licenses/lgpl.html>.
 */
<代码> * Copyright(c)科迪斯基金会 *

*这是自由软件:您可以根据GNU的条款重新发布和/或修改它 *自由软件基金会发布的通用公共许可证,3版 *许可证,或任何更高版本。 *

*本程序的发布是希望它有用,但不提供任何担保;没有 *甚至对适销性或特定用途适用性的默示保证。见GNU *有关更多详细信息,请参阅较低的通用公共许可证。GNU Lesser通用公共许可证的副本 *随此程序一起分发,可在 * . */ 由于注释代码中的空格数未知,以及不同语言中的注释字符不同,因此无法使用正则表达式回溯。我当前的正则表达式示例如下:

self._cr_license_re['GNU']                            = re.compile('\sGNU\D')
self._cr_license_re['MIT License']                    = re.compile('MIT License|Licensed MIT|\sMIT\D')
self._cr_license_re['OpenSceneGraph Public License']  = re.compile('OpenSceneGraph Public License', re.IGNORECASE)
self._cr_license_re['Artistic License']               = re.compile('Artistic License', re.IGNORECASE)
self._cr_license_re['LGPL']                           = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE)
self._cr_license_re['BSD']                            = re.compile('\sBSD\D')
self._cr_license_re['Unspecified OS']                 = re.compile('free of charge', re.IGNORECASE)
self._cr_license_re['GPL']                            = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE)
self._cr_license_re['Apache License']                 = re.compile('Apache License', re.IGNORECASE)
self._cr_license_re['Creative Commons']               = re.compile('\sCC\D')
self.\u cr\u license\u re['GNU']=re.compile('\sGNU\D'))
self._cr_license_re['MIT license']=re.compile('MIT license | Licensed MIT | \sMIT\D'))
self.\u cr\u license\u re['OpenSceneGraph Public license']=re.compile('OpenSceneGraph Public license',re.IGNORECASE)
self.\u cr\u license\u re['Artic license']=重新编译('Artic license',re.IGNORECASE)
self._cr_license_re['LGPL']=re.compile('\sLGPL\s |较小的通用公共许可证',re.IGNORECASE)
self.\u cr\u license\u re['BSD']=重新编译('\sBSD\D')
self.\u cr\u license\u re['Unspecified OS']=re.compile('免费',re.IGNORECASE)
self._cr_license_re['GPL']=re.compile('\sGPL\D |)(?您可以使用并替换为空格

\s*\*\s*\/?

这应该将多行注释放在一行上,然后您可以在其中找到许可证。

“如果只有一种方法可以将行粘合到一个长字符串中”?问题是什么?替换
“OpenSceneGraph公共许可证”中的所有文字空格。
(以及任何地方)对于
\s+
,这是一个很好的建议。但是,上面的正则表达式没有删除换行符(
\n
)。最终起作用的是:
text=fid.read().replace('\n','')fin\u text=re.sub('s*\\s*\/?','',text)