注释代码中的Python正则表达式
我试图在大多数文件开头的注释代码中匹配开源许可证类型。但是,对于所需字符串(例如,较小的通用公共许可证)跨越两行的情况,我有困难。例如,请参见下面的代码许可证注释代码中的Python正则表达式,python,regex,Python,Regex,我试图在大多数文件开头的注释代码中匹配开源许可证类型。但是,对于所需字符串(例如,较小的通用公共许可证)跨越两行的情况,我有困难。例如,请参见下面的代码许可证 * Copyright (c) Codice Foundation * <p/> * This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser * General Public Li
* Copyright (c) Codice Foundation
* <p/>
* This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser
* General Public License as published by the Free Software Foundation, either version 3 of the
* License, or any later version.
* <p/>
* This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details. A copy of the GNU Lesser General Public License
* is distributed along with this program and can be found at
* <http://www.gnu.org/licenses/lgpl.html>.
*/
<代码> * Copyright(c)科迪斯基金会
*
*这是自由软件:您可以根据GNU的条款重新发布和/或修改它
*自由软件基金会发布的通用公共许可证,3版
*许可证,或任何更高版本。
*
*本程序的发布是希望它有用,但不提供任何担保;没有
*甚至对适销性或特定用途适用性的默示保证。见GNU
*有关更多详细信息,请参阅较低的通用公共许可证。GNU Lesser通用公共许可证的副本
*随此程序一起分发,可在
* .
*/
由于注释代码中的空格数未知,以及不同语言中的注释字符不同,因此无法使用正则表达式回溯。我当前的正则表达式示例如下:
self._cr_license_re['GNU'] = re.compile('\sGNU\D')
self._cr_license_re['MIT License'] = re.compile('MIT License|Licensed MIT|\sMIT\D')
self._cr_license_re['OpenSceneGraph Public License'] = re.compile('OpenSceneGraph Public License', re.IGNORECASE)
self._cr_license_re['Artistic License'] = re.compile('Artistic License', re.IGNORECASE)
self._cr_license_re['LGPL'] = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE)
self._cr_license_re['BSD'] = re.compile('\sBSD\D')
self._cr_license_re['Unspecified OS'] = re.compile('free of charge', re.IGNORECASE)
self._cr_license_re['GPL'] = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE)
self._cr_license_re['Apache License'] = re.compile('Apache License', re.IGNORECASE)
self._cr_license_re['Creative Commons'] = re.compile('\sCC\D')
self.\u cr\u license\u re['GNU']=re.compile('\sGNU\D'))
self._cr_license_re['MIT license']=re.compile('MIT license | Licensed MIT | \sMIT\D'))
self.\u cr\u license\u re['OpenSceneGraph Public license']=re.compile('OpenSceneGraph Public license',re.IGNORECASE)
self.\u cr\u license\u re['Artic license']=重新编译('Artic license',re.IGNORECASE)
self._cr_license_re['LGPL']=re.compile('\sLGPL\s |较小的通用公共许可证',re.IGNORECASE)
self.\u cr\u license\u re['BSD']=重新编译('\sBSD\D')
self.\u cr\u license\u re['Unspecified OS']=re.compile('免费',re.IGNORECASE)
self._cr_license_re['GPL']=re.compile('\sGPL\D |)(?您可以使用并替换为空格
\s*\*\s*\/?
这应该将多行注释放在一行上,然后您可以在其中找到许可证。“如果只有一种方法可以将行粘合到一个长字符串中”?问题是什么?替换“OpenSceneGraph公共许可证”中的所有文字空格。
(以及任何地方)对于\s+
,这是一个很好的建议。但是,上面的正则表达式没有删除换行符(\n
)。最终起作用的是:text=fid.read().replace('\n','')fin\u text=re.sub('s*\\s*\/?','',text)