Python 熊猫-尝试在.csv中存储多个.txt文件_Python_Pandas

Python 熊猫-尝试在.csv中存储多个.txt文件

python pandas

Python 熊猫-尝试在.csv中存储多个.txt文件,python,pandas,Python,Pandas,我有一个大约500.txt文件的文件夹。我想将内容存储在csv文件中，有2列，第1列是文件名，第2列是字符串中的文件内容。因此，我将得到一个包含501行的CSV文件我四处窥探，试图找到类似的问题，并得出以下代码： import pandas as pd from pandas.io.common import EmptyDataError import os def Aggregate_txt_csv(path): for files in os.listdir(path):

我有一个大约500.txt文件的文件夹。我想将内容存储在csv文件中，有2列，第1列是文件名，第2列是字符串中的文件内容。因此，我将得到一个包含501行的CSV文件

我四处窥探，试图找到类似的问题，并得出以下代码：

import pandas as pd
from pandas.io.common import EmptyDataError
import os


def Aggregate_txt_csv(path):
    for files in os.listdir(path):
            with open(files, 'r') as file:
                try: 
                    df = pd.read_csv(file, header=None, delim_whitespace=True)
                except EmptyDataError:
                    df = pd.DataFrame()
                
            return df.to_csv('file.csv', index=False)

但是，它返回一个空的.csv文件。我做错什么了吗？

您的代码有几个问题。其中之一是pd.read_csv没有打开

文件

，因为您没有传递给定文件的路径。我想你应该试着从这个代码开始玩

导入操作系统
作为pd进口熊猫
从pandas.io.common导入EmptyDataError
def聚合_txt_csv（路径）：
files=os.listdir（路径）
df=[]
对于文件中的文件：
尝试：
d=pd.read\u csv（os.path.join（path，file），header=None，delim\u whitespace=True）
d[“文件”]=文件
除EmptyDataError外：
d=pd.DataFrame（{“文件”：[file]}）
df.append（d）
df=pd.concat（df，忽略索引=True）
df.to_csv（'file.csv'，index=False）

使用
- 查找所有文件
- 使用路径对象时，从路径返回文件名
用于组合
```
df\u列表中的数据帧
```


从pathlib导入路径
作为pd进口熊猫
p=Path（'e:/PythonProjects/stack_overflow'）#文件路径
files=p.glob（'*.txt'）#获取所有txt文件
df_list=list（）#为数据帧创建一个空列表
对于文件中的文件：#遍历每个文件
将file.open（'r'）作为f：
text='\n'.join（[line.strip（）表示f.readlines（）]中的行）#将列表中的所有行作为单个字符串连接起来，并用分隔符分隔\n
df_list.append（pd.DataFrame（{'filename'：[file.stem]，'contents'：[text]}））#创建并追加一个数据帧
df#u all=pd.concat（df#u列表）#concat所有数据帧
df_all.to_csv（'files.txt'，index=False）#保存到csv
我注意到已经有了答案，但我已经用一段相对简单的代码得到了答案。我只编辑了读取的文件，数据帧输出成功

最重要的是，我在这里附加了一个数组，以避免过多地运行pandas的concatenate函数，因为这对性能非常不利。此外，读取文件不需要读取csv，因为文件没有设置格式。因此，使用“\n”.join（file.readlines（））
可以简单地读入文件，并将所有行提取为字符串
最后，我将字典数组转换为最终的数据帧，并返回结果
编辑：对于不是当前目录的路径，我将其更新为附加路径，以便它可以找到必要的文件，对混淆表示歉意谢谢，但我收到一条错误消息，上面写着“ParserError:error Tokenized data.C错误：第6行中预期有2个字段，在”->12 d=pd.read\u csv上看到了4个字段（os.path.join（path，file），header=None，delim\u whitespace=True）“有一个拼写错误，我已经纠正了。你介意在函数外运行同一行吗？我无法重现错误。谢谢-我想你的意思是df=Aggregate\u txt\u csv（path）因为未定义“文件”。当我将“文件”更改为路径时，会出现以下错误：UnicodeDecodeError:“charmap”编解码器无法对位置126中的字节0x81进行解码：字符映射到。我在“content=”\n.join（file.readlines（））上遇到该错误啊，是的，很抱歉，我不确定您的数据文件夹的名称，它在我的回复上称为文件。您的文件中似乎有一些unicode内容？这发生在哪一行？当我运行函数时，它发生在显示content='\n'。join（file.readlines（））
import pandas as pd
from pandas.io.common import EmptyDataError
import os


def Aggregate_txt_csv(path):
    result = []
    print(os.listdir(path))
    for files in os.listdir(path):
        fullpath = os.path.join(path, files)
        if not os.path.isfile(fullpath):
            continue

        with open(fullpath, 'r', errors='replace') as file:
            try:
                content = '\n'.join(file.readlines())
                result.append({'title': files, 'body': content})
            except EmptyDataError:
                result.append({'title': files, 'body': None})
            
    df = pd.DataFrame(result)
    return df

df = Aggregate_txt_csv('files')
print(df)
df.to_csv('result.csv')