Python 将文本文件转换为数据帧

Python 将文本文件转换为数据帧,python,pandas,readlines,Python,Pandas,Readlines,我有.TX0文件(某种csv txt文件),并通过python.readlines()、open(filename,'w')等方法将其转换为.txt文件。我有这个新保存的txt文件,但当我试图将其转换为数据帧时,它只给我一列。txt文件如下所示: Empty DataFrame Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n', '"Reprocess Number:", vma2: 2615

我有.TX0文件(某种csv txt文件),并通过python.readlines()、open(filename,'w')等方法将其转换为.txt文件。我有这个新保存的txt文件,但当我试图将其转换为数据帧时,它只给我一列。txt文件如下所示:

Empty DataFrame
Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n',  '"Reprocess Number:", vma2:  261519, Unnamed: 7, \n',  '"Sample Name:",  , Data Acquisition Time:, 18/08/2015 17:23:23\n',  '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n',  '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n',  '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n',  '"Cycle:", 1, Result File :, \\\\vma2\\TotalChrom\11170_he_tcd001.rst \n',  '"Sequence File :", \\\\vma\C1_C2_binary.seq \n',  '"===================================================================================================================================="\n',  '""\n',  '""\n'.1,  '"condensate analysis (HP4890 Optic - FID)"\n',  '"Peak", Component, Time, Area, Height, BL\n',  '"#", Name, [min], [uV*sec], [uV], \n'.1,  '------, ------, ------.1, ------.2, ------.3, ------\n',  '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n',  '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1,  '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2,  '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2,  '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3,  '"Missing Component Report"\n',  '"Component", Expected Retention (Calibration File)\n',  '------.1, ------\n'.1,  '"All components were found"\n',  '"Report stored in ASCII file :", C:\\Shared Folders\\TotalChrom\\11170_he_tcd001.TX0 \n']]
Index: []
为了便于阅读:

空数据帧

列:['“软件版本:”,6.3.2.0646,日期:2015年8月19日 09:26:04\n',““再处理编号:”,vma2:261519,未命名:7,\n', “样本名称:”,数据采集时间:,18/08/2015 17:23:23\n“, “仪器名称:”,天然气(PE ASXL-TCD/FID),通道:,B\n“, ““支架/小瓶:”,0,0.1,操作员:,joey.walker\n“,““样本量:”, 1.000000,稀释系数:,1.000000\n',“循环:”,1,结果文件:,\\vma2\TotalChrom\data\Joey\Binary\u Mixels\Std1\11170\u he\u tcd001.rst \序列文件:“, \\vma2\TotalChrom\sequences\Joey\C1_C2_binary.seq\n', “====================================================================================================================================================================================================================================================================================================”\n',“\n'”\n',“\n',“\n'''''''''==============================================================================================================================================================================================================================================================================, “‘峰值’、分量、时间、面积、高度、BL\n’、“#”、名称、[min], [uV*秒]、[uV]、\n'.1、'-、'-、'-、'-.1、'-.2、'-.3、, ------\n',1,未命名:55,0.810,706.42,304.38,*BB\n',2,CH4,0.900,1113518.24495918.41,*BB\n'.1',3,C2H6,1.373,901670.23295381.12,*BB\n'.2',未命名:73,未命名:74,--.4,--.5,*n'.2','.1,未命名:79,未命名:802015894.89,791603.91,'.3',缺失组件报告'',预期保留期(校准文件)\n','-.1,--\n'.1, ““找到了所有组件”\n“,”存储在ASCII文件中的报告:”, C:\Shared 文件夹\TotalChrom\data\Joey\Binary\u mixes\Std1\11170\u he\u tcd001.TX0 \n']]索引:[]

如您所见,这是逗号分隔的。有没有办法将此文本传输到逗号分隔的数据帧

谢谢


J

您可以尝试使用以下功能,它将帮助您从本地csv文件加载所有数据

ps.read_csv()

更多详细信息请参见

您可以尝试以下代码将文本文件转换为数据帧

data = pd.read_csv('file.txt', sep=',')

希望它是不言自明的。

我对这个问题给出了一个一般性的答案:

import re
import pandas as pd

#first u have to open  the file and seperate every line like below:

df = open('file.txt', "r")
lines = df.readlines()
df.close()

# remove /n at the end of each line
for index, line in enumerate(lines):
      lines[index] = line.strip()



#creating a dataframe(consider u want to convert your data to 2 columns)

df_result = pd.DataFrame(columns=('first_col', 'second_col'))
i = 0  
first_col = "" 
second_col = ""  
for line in lines:
    #you can use "if" and "replace" in case you had some conditions to manipulate the txt data
    if 'X' in line:
        first_col = line.replace('X', "")
    else:
        #you have to kind of define what are the values in columns,for example second column includes:
        second_col = re.sub(r' \(.*', "", line)
        #this is how you create next line data
        df_result.loc[i] = [first_col, second_col]
        i =i+1

问题是因为它将文本文件视为一列,因此无法从中构造数据帧。有没有办法将文本文件拆分为逗号分隔的列和行?