在python中的多个文件和文件夹中查找作者姓名_Python

在python中的多个文件和文件夹中查找作者姓名

python

在python中的多个文件和文件夹中查找作者姓名,python,Python,请帮助我如何使用此else函数。因为当我想使用此函数时，此函数会一直循环打印未知 import os folderpath = 'D:/Workspace' typeOfFile = [".c", ".C", ".cpp", ".CPP"] for dirname, dirnames, filenames in os.walk(folderpath): for filename in filenames: if filename.endswith(tu

请帮助我如何使用此else函数。因为当我想使用此函数时，此函数会一直循环打印未知

import os

folderpath = 'D:/Workspace'
typeOfFile = [".c", ".C", ".cpp", ".CPP"]

for dirname, dirnames, filenames in os.walk(folderpath):
        for filename in filenames:
            if filename.endswith(tuple(typeOfFile)):
                for line in open(os.path.join(dirname, filename), "r").readlines():
                    left,author,right = line.partition('author')
                    if author:
                        name =(right[:100])
                        combine = name.replace(" ", "")
                        remove = combine.strip(':')
                        print remove

因为如果文件中没有字符串author。它将跳过该文件并查找其他作者。对不起，我的英语不好。谢谢

我们可以通过构建模式来提取名称：

通常，我们会在author关键字后找到一个author名称，作为注释。有些人更喜欢使用uuu author_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。这取决于你平时观察的情况。鼓励大家在github上查看人们在评论中如何使用author

作者的名字通常在这之后，有些人用昵称，所以它不是一直用字母

                else: 
                    print 'unknown'

在正则表达式中。表示任何字符，a+表示一个或多个，^表示除该字符外的其他字符。您希望在文本中“匹配”一个或多个字符，但新行除外。有些人在写名字/姓氏时也可能使用空格。括号的意思是捕获一个词中的内容，这个词应该在特定的文本作者之后找到，该作者之前接受任何charchter，之后接受任何charchter，除了“=/：”，“：/=”之后用于识别其他部分

除了打开文件的操作外，还要验证格式。让我们考虑这个快速例子来说明如何使用正则表达式来提取作者姓名。

pattern= r'.*author*.[^:=]*[:=]*(.[^\n]+)'

输出：

#simple case
data1= """
author='helloWorld' 

def hello()
    print "hello world" 
    """
# case with two ::
data2= """
__author__::'someone'

def hello()
    print "hello world" 
    """
#case where we have numerical/alphabetic
data3= """
__author__='someone201119'

def hello()
    print "hello world" 
    """
#Case where no author in code
data4= """
def hello()
    print "hello world" 
    """


for data in [data1,data2,data3,data4]:
    m= re.match(r'.*author*.[^:=]*[:=]*(.[^\n]+)',data,re.DOTALL)
    if m: 
        author= m.group(1)
    else:
        author='unkown'
    print "author is this case is:", author

更新

您的总体代码如下所示：

author is this case is: 'helloWorld'
author is this case is: 'someone'
author is this case is: 'someone201119'
author is this case is: unkown

实际上我需要从文件中找到作者。我正在编写脚本并将其放入excel文件。我的老板给了我需要找到作者的项目。每个文件夹中的每个都需要检查。它有将近2000个C和C++文件。其中有些人没有作者。所以当我在excel文件上打印时。某些文件缺少作者，因为上面的所有作者堆栈。所以我需要else函数来打印未知。谢谢你的回答。看来你没有抓住这个答案背后的要点。更新部分中的脚本执行您提到的操作。答案解释了如何从不同的案例中提取作者，而没有强调如何打开和读取不同的文件，假设您已经完全了解了这一点。再次检查更新，您的代码将是这样的。为什么我的python库出现错误？类似于“return\u compilepattern，flags.matchstring TypeError:expected string或buffer”文件D:\Python27\lib\re.py，第137行，在match中

import os
import re

folderpath = 'D:/Workspace'
typeOfFile = [".c", ".C", ".cpp", ".CPP"]

for dirname, dirnames, filenames in os.walk(folderpath):
        for filename in filenames:
            if filename.endswith(tuple(typeOfFile)):
                data= open(os.path.join(dirname, filename), "r").readlines():
                m= re.match(r'.*author*.[^:=]*[:=]*(.[^\n]+)',data,re.DOTALL)
                if m: 
                    author= m.group(1)
                else:
                    author='unkown'
                print "author is this case is:", author, "in file", filename