Python从设置文件动态加载元组/列表_Python_List_Dynamic_Nested Lists

Python从设置文件动态加载元组/列表

python list dynamic

Python从设置文件动态加载元组/列表,python,list,dynamic,nested-lists,Python,List,Dynamic,Nested Lists,我想从设置文件动态加载列表/元组我需要编写一个爬虫来爬网一个网站，但我想知道找到的文件，而不是网页我允许用户在settings.py文件中指定此类文件类型，如下所示： # Document Types during crawling textFiles = ['.doc', '.docx', '.log', '.msg', '.pages', '.rtf', '.txt', '.wpd', '.wps'] dataFiles = ['.csv', '.dat', '.efx', '.gbr'

我想从设置文件动态加载列表/元组

我需要编写一个爬虫来爬网一个网站，但我想知道找到的文件，而不是网页

我允许用户在

settings.py

文件中指定此类文件类型，如下所示：

# Document Types during crawling
textFiles = ['.doc', '.docx', '.log', '.msg', '.pages', '.rtf', '.txt', '.wpd', '.wps']
dataFiles = ['.csv', '.dat', '.efx', '.gbr', '.key', '.pps', '.ppt', '.pptx', '.sdf', '.tax2010', '.vcf', '.xml']
audioFiles = ['.3g2','.3gp','.asf','.asx','.avi','.flv','.mov','.mp4','.mpg','.rm','.swf','.vob','.wmv']


#What lists would you like to use ?
fileLists = ['textFiles', 'dataFiles', 'audioFiles']

我在

crawler.py

我使用

beautifulsoup

模块从HTML内容中查找链接，并按如下方式处理：

for item in soup.find_all("a"):
            # we dont want some of them because it is just a link to the current page or the startpage
            if item['href'] in dontWantList:
                continue

            #check if link is a file based on the fileLists from the settings
            urlpath = urlparse.urlparse(item['href']).path
            ext = os.path.splitext(urlpath)[1]
            file = False
            for list in settings.fileLists:
                if ext in settings.list:
                    file = True
                    #found file link
                    if self.verbose:
                        messenger("Found a file of type: %s" % ext, Colors.PURPLE)
                    if ext not in fileLinks:
                        fileLinks.append(item['href'])

            #Only add the link if it is not a file
            if file is not True:
                links.append(item['href'])
            else:
                #Do not add the file to the other lists
                continue

以下代码段引发错误：

 for list in settings.fileLists:
                if ext in settings.list:

显然，因为python认为settings.list是一个列表

还有什么方法可以告诉python动态查看设置文件中的列表吗？

我认为您要查找的不是：

if ext in settings.list:

你需要

ext_list = getattr(settings, list)
if ext in ext_list:

编辑：

我同意jonrsharpe关于列表的观点，因此我在代码中将其重命名为

不要命名自己的变量

列表

，你可以隐藏内置变量。另外，使用

集合

将使成员资格测试更有效。

设置.列表

来自哪里？谢谢。我也修改了我的名字。我的IDE对此也不是很满意：）