Python 如何使用os.walk或glob.glob获取目录中所有类型的文件扩展名_Python_File Extension_Os.walk_Language Detection

Python 如何使用os.walk或glob.glob获取目录中所有类型的文件扩展名

python

Python 如何使用os.walk或glob.glob获取目录中所有类型的文件扩展名,python,file-extension,os.walk,language-detection,Python,File Extension,Os.walk,Language Detection,我有一个检测目录中文件语言的代码。但是，在提到扩展名类型时，我如何检测目录中所有文件扩展名（例如：-.pdf、.xlsx、.docx等）的语言，而不仅仅是代码中提到的.txt文件。附加代码以供参考。我想知道如何使用glob和os.walk实现这一点 import csv from fnmatch import fnmatch try: from langdetect import detect except ImportError: detect = lambda _: '<

我有一个检测目录中文件语言的代码。但是，在提到扩展名类型时，我如何检测目录中所有文件扩展名（例如：-.pdf、.xlsx、.docx等）的语言，而不仅仅是代码中提到的.txt文件。附加代码以供参考。我想知道如何使用glob和os.walk实现这一点

import csv
from fnmatch import fnmatch
try:
    from langdetect import detect
except ImportError:
    detect = lambda _: '<dunno>'
import os

rootdir = '.'  # current directory
extension = '.txt'
file_pattern = '*' + extension

with open('output.csv', 'w', newline='', encoding='utf-8') as outfile:
    csvwriter = csv.writer(outfile)

    for dirpath, subdirs, filenames in os.walk(os.path.abspath(rootdir)):
        for filename in filenames:
            if fnmatch(filename, file_pattern):
                lang = detect(os.path.join(dirpath, filename))
                csvwriter.writerow([dirpath, filename, lang])

导入csv
从fnmatch导入fnmatch
尝试：
从langdetect导入detect
除恐怖外：
detect=lambda\u1:“”
导入操作系统
rootdir='.#当前目录
扩展名='.txt'
文件模式='*'+扩展名
以open（'output.csv'，'w'，newline=''，encoding='utf-8'）作为输出文件：
csvwriter=csv.writer（输出文件）
对于os.walk（os.path.abspath（rootdir））中的dirpath、subdir和文件名：
对于文件名中的文件名：
如果fnmatch（文件名、文件格式）：
lang=detect（os.path.join（dirpath，文件名））
csvwriter.writerow（[dirpath，filename，lang]）

IIUC您可以通过以下方式替换您的

fnmatch

检查

eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt']     # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
    lang = ...

IIUC您可以通过以下方式替换您的

fnmatch

检查

eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt']     # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
    lang = ...

如果你真的是这么写的：“所有”文件扩展名：只需将

txt

替换为

。但我猜你的意思是“不止一个文件扩展名，即此列表而不是

.txt

仅：['.pdf'，'.xlsx'，'.docx']”。对吗？如果你真的是这么写的：“所有”文件扩展名：只需将

txt

替换为

。但我猜你的意思是“不止一个文件扩展名，即此列表而不是

.txt

仅：['.pdf'，'.xlsx'，'.docx']”。对吧？很有效。非常感谢。如何避免编写此部分，然后使用“extension='.txt'”，这样就可以使用.txt和其他文件扩展名，只需将任何文件扩展名添加到条件中LC的列表中即可。请查看我的编辑，如果它解决了您的问题。谢谢。另一方面，我不认为代码正在读取每个文件的内容并决定使用哪种语言。我该如何在代码中添加它？我不太确定iiuc…-你的意思是，过滤所需的文件扩展名现在可以工作了，但除此之外，你还有另一个关于语言检测部分的问题吗？好的，那么你也许应该提出一个新的问题。我没有任何

langdetect

工作经验。非常感谢。如何避免编写此部分，然后使用“extension='.txt'”，这样就可以使用.txt和其他文件扩展名，只需将任何文件扩展名添加到条件中LC的列表中即可。请查看我的编辑，如果它解决了您的问题。谢谢。另一方面，我不认为代码正在读取每个文件的内容并决定使用哪种语言。我该如何在代码中添加它？我不太确定iiuc…-你的意思是，过滤所需的文件扩展名现在可以工作了，但除此之外，你还有另一个关于语言检测部分的问题吗？好的，那么你也许应该提出一个新的问题。我没有任何使用

langdetect

的经验。