Python 从复杂文本文件中读取和提取数据

Python 从复杂文本文件中读取和提取数据,python,formatting,text-files,Python,Formatting,Text Files,我是python的初学者,我尝试使用它来自动化重复的任务。我在以下任务中苦苦挣扎: 我有一堆文本文件,格式如下: Cluster (x,y,z) size size p-FWE size p-FDR size p-unc mass mass p-FWE mass p-FDR mass p-unc -44 -58 +36 361 0.049000 0.030607 0.000068

我是python的初学者,我尝试使用它来自动化重复的任务。我在以下任务中苦苦挣扎: 我有一堆文本文件,格式如下:

Cluster (x,y,z)         size   size p-FWE   size p-FDR   size p-unc         mass   mass p-FWE   mass p-FDR   mass p-unc
-44 -58 +36              361     0.049000     0.030607     0.000068      3797.10     0.058000     0.036292     0.000081
-38 -84 +18              344     0.057000     0.030607     0.000079      3386.52     0.071000     0.036292     0.000107
-42 -06 +30              259     0.108000     0.045083     0.000175      3072.19     0.091000     0.036292     0.000141




Cluster -44 -58 +36  :
248 voxels (69%) covering 5% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
51 voxels (14%) covering 5% of atlas.AG l (Angular Gyrus Left)
62 voxels (17%) covering 0% of atlas.not-labeled

Cluster -38 -84 +18  :
163 voxels (47%) covering 3% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
107 voxels (31%) covering 5% of atlas.iLOC l (Lateral Occipital Cortex, inferior division Left)
25 voxels (7%) covering 1% of atlas.OP l (Occipital Pole Left)
49 voxels (14%) covering 0% of atlas.not-labeled

Cluster -42 -6 +30  :
89 voxels (34%) covering 2% of atlas.PreCG l (Precentral Gyrus Left)
1 voxels (0%) covering 0% of atlas.IFG oper l (Inferior Frontal Gyrus, pars opercularis Left)
169 voxels (65%) covering 0% of atlas.not-labeled

All clusters combined :
411 voxels (43%) covering 8% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
107 voxels (11%) covering 5% of atlas.iLOC l (Lateral Occipital Cortex, inferior division Left)
89 voxels (9%) covering 2% of atlas.PreCG l (Precentral Gyrus Left)
51 voxels (5%) covering 5% of atlas.AG l (Angular Gyrus Left)
25 voxels (3%) covering 1% of atlas.OP l (Occipital Pole Left)
1 voxels (0%) covering 0% of atlas.IFG oper l (Inferior Frontal Gyrus, pars opercularis Left)
280 voxels (29%) covering 0% of atlas.not-labeled
我想按照下表中描述的方式读取数据帧并将其格式化

不幸的是,这远远超出了我的初学者的能力,因此我正在寻求一些帮助来构建python脚本。 非常感谢您的感谢

种子投资回报率 簇(x,y,z) 大小 尺寸p-FWE 尺寸p-FDR 尺寸p-unc 大量 质量p-FWE 质量p-FDR 质量p-unc 体素 覆盖 区域 FP_I -44 -58 +36 361 0.049000 0.030607 0.000068 3797.10 0.058000 0.036292 0.000081 248 5. atlas.sLOC(枕外侧皮质,左上分区) FP_I -44 -58 +36 361 0.049000 0.030607 0.000068 3797.10 0.058000 0.036292 0.000081 51 5. atlas.AG l(左侧角回) FP_I -44 -58 +36 361 0.049000 0.030607 0.000068 3797.10 0.058000 0.036292 0.000081 62 0 atlas.not-label FP_I -38 -84 +18 344 0.057000 0.030607 0.000079 3386.52 0.071000 0.036292 0.000107 163 3. atlas.sLOC(枕外侧皮质,左上分区)
这里有一个方法可以做到这一点。我复制了您的示例文本文件内容并保存在一个文件中。也许有更有效的方法可以做到这一点。但这应该行得通

你应该得到这样一份清单:

[{'Cluster (x,y,z)': '-44 -58 +3', 'size': ' 361', 'size p-FWE': '0.049000', 'size p-FDR': '0.030607', 'size p-unc': '0.000068', 'mass': ' 3797.10', 'mass p-FWE': 
'0.058000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000081'}, {'Cluster (x,y,z)': '-38 -84 +1', 'size': ' 344', 'size p-FWE': '0.057000', 'size p-FDR': '0.030607', 'size p-unc': '0.000079', 'mass': ' 3386.52', 'mass p-FWE': '0.071000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000107'}, {'Cluster (x,y,z)': '-42 -06 +3', 
'size': ' 259', 'size p-FWE': '0.108000', 'size p-FDR': '0.045083', 'size p-unc': '0.000175', 'mass': ' 3072.19', 'mass p-FWE': '0.091000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000141'}]
这是更新后的代码

headers=['Cluster (x,y,z)','size','size p-FWE','size p-FDR','size p-unc','mass','mass p-FWE','mass p-FDR','mass p-unc']
data=[]
with open("katerlo.txt","r") as txtfile:
    data=txtfile.readlines()


final_data=[]
for i in range(len(data)):
    if(i==0):
        continue #we skip the first line
    header_index=0 
    line=data[i]
    #i am using a crude way to do it, since I can see that your data is in a fixed range
    #also, I will output a json/list-dict instead of a table, I believe you can take it from there
    
    if(len(line)<=2): #i am assuming your main interest is the first part, so I am not going to parse the bottom part
        print(len(line))
        break

    first_col_data=line[:10] #extracting the odd first column

    cur_json={}
    cur_json[headers[header_index]]=first_col_data
    header_index=header_index+1
    line=line[24:] #i am skipping the large spacing which looks like completely defined
    remaining_col_arr=line.split("     ") #the spacing defined in your text file as the delimiter
    for item in remaining_col_arr:
        item=item.replace('\n','') #replace endline character if it appears. There are much more efficient ways to do this though
        cur_json[headers[header_index]]=item #extracting the remaining column data
        header_index=header_index+1
    final_data.append(cur_json)
print(final_data)
headers=['Cluster(x,y,z)'、'size'、'size p-FWE'、'size p-FDR'、'size p-unc'、'mass'、'mass p-FWE'、'mass p-FDR'、'mass p-unc']
数据=[]
打开(“katerlo.txt”、“r”)作为txt文件:
data=txtfile.readlines()
最终数据=[]
对于范围内的i(len(数据)):
如果(i==0):
继续,我们跳过第一行
标题索引=0
行=数据[i]
#我使用的是一种粗略的方法,因为我可以看到您的数据在一个固定的范围内
#另外,我将输出一个json/list dict,而不是一个表,我相信您可以从那里获得它

if(len(行)你的表格太难识别。你能以可读的方式编辑它吗?我已经做了更改。它看起来更好吗?在我的计算机上,表格显示正确谢谢你非常感谢这个脚本。当我运行它时,我收到一条错误消息替换预期至少2个参数,得到1错误来自此行item=item.replace('\n')#如果出现尾行字符,则替换它。有更有效的方法来执行此操作,尽管我的缺点是,应该是item=item.replace('\n','')