Python 迭代子文件夹并将文件格式从txt转换为csv_Python_Pandas_Loops

Python 迭代子文件夹并将文件格式从txt转换为csv

python pandas loops

Python 迭代子文件夹并将文件格式从txt转换为csv,python,pandas,loops,Python,Pandas,Loops,对于当前项目，我计划运行多个子文件夹，每个子文件夹都包含文件num.txt和sub.txt（但都有不同的内容）我已经尝试通过为os.walk（rootdir）中的subdir、dirs和文件设置循环：，并使用后续的转换公式，该公式允许脚本运行，但不会产生任何结果是否有任何智能调整来激活从txt到csv的文件类型转换？我当前使用的代码如下所示： import pandas as pd import os # Directory of root folder rootdir = '/Users

对于当前项目，我计划运行多个子文件夹，每个子文件夹都包含文件

num.txt

和

sub.txt

（但都有不同的内容）

我已经尝试通过

为os.walk（rootdir）中的subdir、dirs和文件设置循环：

，并使用后续的转换公式，该公式允许脚本运行，但不会产生任何结果

是否有任何智能调整来激活从txt到csv的文件类型转换？我当前使用的代码如下所示：

import pandas as pd
import os

# Directory of root folder
rootdir = '/Users/name/SEC'

# Iteration over sub-folders
for subdir, dirs, files in os.walk(rootdir):
    for file in files:

        # Converation from TXT to CSV
        read_file1 = pd.read_csv("num.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file1.to_csv("df1.csv")

        read_file2 = pd.read_csv("sub.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file2.to_csv("df2.csv")

你一直在读写同样的两个文件。您所需要做的就是完成交给pd.read\u csv的路径

对于os.walk（rootdir）中的subdir、dir和文件：
read_file1=pd.read_csv（os.path.join（subdir，“num.txt”），delimiter=“\t”，sep=”，“，
错误\u bad\u line=False，index\u col=False，dtype='unicode'）
读取\u file1.to \u csv（os.path.join（子目录，“df1.csv”））

您一直在读写相同的两个文件。您所需要做的就是完成交给pd.read\u csv的路径

对于os.walk（rootdir）中的subdir、dir和文件：
read_file1=pd.read_csv（os.path.join（subdir，“num.txt”），delimiter=“\t”，sep=”，“，
错误\u bad\u line=False，index\u col=False，dtype='unicode'）
读取\u file1.to \u csv（os.path.join（子目录，“df1.csv”））

您需要将

子目录

路径添加到文件名中。无需按照

dirs

或按照

文件执行此操作（因此我将它们都设置为“ux”），因为每个子目录在for中已经访问过一次
import pandas as pd
import os

# Directory of root folder
rootdir = '/Users/name/SEC'

# Iteration over sub-folders
for subdir, _, _ in os.walk(rootdir):
    # Converation from TXT to CSV
    try:
        read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file1.to_csv(os.path.join(subdir, "df1.csv"))
    except FileNotFoundError:
        pass

    try:
        read_file2 = pd.read_csv(os.path.join(subdir, "sub.txt"),delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file2.to_csv(os.path.join(subdir, "df2.csv"))
    except FileNotFoundError:
        pass

在linux上迭代到诸如“.ipynb_checkpoints”之类的隐藏目录基本上是无害的，但您可以将它们过滤掉。当您执行自顶向下的os.walk
时，您可以通过从“dirs”列表中删除要遍历的子目录来删除它们。可以在Windows上使用win32api.GetFileAttributes
执行类似操作
for subdir, dirs, _ in os.walk(rootdir):
    dirs[:] = [name for name in dirs if not name.startswith(".")]

    ...do the rest...

您可以使用pathlib
更紧凑地连接路径。其路径
对象覆盖分割以连接路径字符串
import pandas as pd
import os
from pathlib import Path

# Directory of root folder
rootdir = '/Users/name/SEC'

# Iteration over sub-folders
for subdir, dirs, _ in os.walk(rootdir):
    # filter out hidden
    dirs[:] = [name for name in dirs if not name.startswith(".")]
    subdir = Path(subdir)
    # Converation from TXT to CSV
    try:
        read_file1 = pd.read_csv(subdir/"num.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file1.to_csv(subdir/"df1.csv")
    except FileNotFoundError:
        pass

    try:
        read_file2 = pd.read_csv(subdir/"sub.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file2.to_csv(subdir/"df2.csv")
    except FileNotFoundError:
        pass

您需要将子目录
路径添加到文件名中。无需按照dirs
或按照文件执行此操作（因此我将它们都设置为“ux”），因为每个子目录在for中已经访问过一次
import pandas as pd
import os

# Directory of root folder
rootdir = '/Users/name/SEC'

# Iteration over sub-folders
for subdir, _, _ in os.walk(rootdir):
    # Converation from TXT to CSV
    try:
        read_file1 = pd.read_csv(os.path.join(subdir, "num.txt"),delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file1.to_csv(os.path.join(subdir, "df1.csv"))
    except FileNotFoundError:
        pass

    try:
        read_file2 = pd.read_csv(os.path.join(subdir, "sub.txt"),delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file2.to_csv(os.path.join(subdir, "df2.csv"))
    except FileNotFoundError:
        pass

在linux上迭代到诸如“.ipynb_checkpoints”之类的隐藏目录基本上是无害的，但您可以将它们过滤掉。当您执行自顶向下的os.walk
时，您可以通过从“dirs”列表中删除要遍历的子目录来删除它们。可以在Windows上使用win32api.GetFileAttributes
执行类似操作
for subdir, dirs, _ in os.walk(rootdir):
    dirs[:] = [name for name in dirs if not name.startswith(".")]

    ...do the rest...

您可以使用pathlib
更紧凑地连接路径。其路径
对象覆盖分割以连接路径字符串
import pandas as pd
import os
from pathlib import Path

# Directory of root folder
rootdir = '/Users/name/SEC'

# Iteration over sub-folders
for subdir, dirs, _ in os.walk(rootdir):
    # filter out hidden
    dirs[:] = [name for name in dirs if not name.startswith(".")]
    subdir = Path(subdir)
    # Converation from TXT to CSV
    try:
        read_file1 = pd.read_csv(subdir/"num.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file1.to_csv(subdir/"df1.csv")
    except FileNotFoundError:
        pass

    try:
        read_file2 = pd.read_csv(subdir/"sub.txt",delimiter="\t", sep=',', error_bad_lines=False, index_col=False, dtype='unicode')
        read_file2.to_csv(subdir/"df2.csv")
    except FileNotFoundError:
        pass

这是否会引发FileNotFoundError
？它只是在运行，没有任何错误。但是，不会转换文件类型。脚本运行时，存在当前工作目录。您一直在该目录中转换“num.txt”。我将发布一个建议的解决方案。那太好了，非常感谢您已经提前提出了此提升FileNotFoundError
？它正在运行，没有任何错误。但是，不会转换文件类型。脚本运行时，存在当前工作目录。您一直在该目录中转换“num.txt”。我将发布一个建议的解决方案。那太好了，非常感谢您，您可能希望将index=False
传递到。传递到\u csv
以匹配文本文件中的内容感谢您的输入-我已经尝试了代码-行read\u file1=pd.read\u csv（os.path.join（rootdir，dir，“num.txt”），分隔符=“\t”，sep='，'，error\u bad\u lines=False，index\u col=False，dtype='unicode'）
产生语法错误。我目前正在检查原因…不需要第二个dirs
循环os.walk
访问所有目录。我记不清os.walk返回的确切内容，现在应该可以了。您可能希望将index=False
传递到。传递到_csv
以匹配文本文件中的内容感谢您的输入-我已经尝试了代码-行read\u file1=pd.read\u csv（os.path.join（rootdir，dir，“num.txt”），delimiter=“\t”，sep='，，error\u bad\u lines=False，index\u col=False，dtype='unicode'）
产生一个语法错误。我正在检查原因…不需要第二个dirs
循环。os.walk
访问所有目录。我记不起os.walk返回的确切内容，现在应该可以了。谢谢。我已经运行了代码-它产生了错误FileNotFoundError:[Errno 2]文件/Users/name/SEC/.ipynb_checkpoints/num.txt不存在：'/Users/malte.susen/SEC/.ipynb_checkpoints/num.txt'
，因此由于某些原因找不到文件…您的代码假设所有子目录都包含这些文件。如果不是这样，操作可以放在异常处理程序中。我将更新。所有子文件夹都包含这些文件iles。实际上，脚本正在转换所有文件夹中的文件，但仍然会产生错误消息。我想异常处理程序应该可以解决问题，正如您提到的那样，使用处理程序/Users/name/SEC/.ipynb_checkpoints/num.txt
确定是否存在？错误消息说不存在。如果您在linux上运行“.pynb_checkpoints”在执行类似“ls”的操作时可能不可见（linux中的规则是以“.”开头的名称是隐藏的），但它们仍然会显示在程序级列表命令中。谢谢。我已经运行了代码-它会产生错误FileNotFoundError:[Errno 2]文件/Users/name/SEC/.ipynb_checkpoints/num.txt不存在：'/Users/malte.susen/SEC/.ipynb_checkpoints/num.txt'
，因此由于某些原因找不到文件…您的代码假定所有子目录都包含这些文件。如果不是这样，则