Python 从复杂文本文件中读取和提取数据
我是python的初学者,我尝试使用它来自动化重复的任务。我在以下任务中苦苦挣扎: 我有一堆文本文件,格式如下:Python 从复杂文本文件中读取和提取数据,python,formatting,text-files,Python,Formatting,Text Files,我是python的初学者,我尝试使用它来自动化重复的任务。我在以下任务中苦苦挣扎: 我有一堆文本文件,格式如下: Cluster (x,y,z) size size p-FWE size p-FDR size p-unc mass mass p-FWE mass p-FDR mass p-unc -44 -58 +36 361 0.049000 0.030607 0.000068
Cluster (x,y,z) size size p-FWE size p-FDR size p-unc mass mass p-FWE mass p-FDR mass p-unc
-44 -58 +36 361 0.049000 0.030607 0.000068 3797.10 0.058000 0.036292 0.000081
-38 -84 +18 344 0.057000 0.030607 0.000079 3386.52 0.071000 0.036292 0.000107
-42 -06 +30 259 0.108000 0.045083 0.000175 3072.19 0.091000 0.036292 0.000141
Cluster -44 -58 +36 :
248 voxels (69%) covering 5% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
51 voxels (14%) covering 5% of atlas.AG l (Angular Gyrus Left)
62 voxels (17%) covering 0% of atlas.not-labeled
Cluster -38 -84 +18 :
163 voxels (47%) covering 3% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
107 voxels (31%) covering 5% of atlas.iLOC l (Lateral Occipital Cortex, inferior division Left)
25 voxels (7%) covering 1% of atlas.OP l (Occipital Pole Left)
49 voxels (14%) covering 0% of atlas.not-labeled
Cluster -42 -6 +30 :
89 voxels (34%) covering 2% of atlas.PreCG l (Precentral Gyrus Left)
1 voxels (0%) covering 0% of atlas.IFG oper l (Inferior Frontal Gyrus, pars opercularis Left)
169 voxels (65%) covering 0% of atlas.not-labeled
All clusters combined :
411 voxels (43%) covering 8% of atlas.sLOC l (Lateral Occipital Cortex, superior division Left)
107 voxels (11%) covering 5% of atlas.iLOC l (Lateral Occipital Cortex, inferior division Left)
89 voxels (9%) covering 2% of atlas.PreCG l (Precentral Gyrus Left)
51 voxels (5%) covering 5% of atlas.AG l (Angular Gyrus Left)
25 voxels (3%) covering 1% of atlas.OP l (Occipital Pole Left)
1 voxels (0%) covering 0% of atlas.IFG oper l (Inferior Frontal Gyrus, pars opercularis Left)
280 voxels (29%) covering 0% of atlas.not-labeled
我想按照下表中描述的方式读取数据帧并将其格式化
不幸的是,这远远超出了我的初学者的能力,因此我正在寻求一些帮助来构建python脚本。
非常感谢您的感谢
种子投资回报率
簇(x,y,z)
大小
尺寸p-FWE
尺寸p-FDR
尺寸p-unc
大量
质量p-FWE
质量p-FDR
质量p-unc
体素
覆盖
区域
FP_I
-44 -58 +36
361
0.049000
0.030607
0.000068
3797.10
0.058000
0.036292
0.000081
248
5.
atlas.sLOC(枕外侧皮质,左上分区)
FP_I
-44 -58 +36
361
0.049000
0.030607
0.000068
3797.10
0.058000
0.036292
0.000081
51
5.
atlas.AG l(左侧角回)
FP_I
-44 -58 +36
361
0.049000
0.030607
0.000068
3797.10
0.058000
0.036292
0.000081
62
0
atlas.not-label
FP_I
-38 -84 +18
344
0.057000
0.030607
0.000079
3386.52
0.071000
0.036292
0.000107
163
3.
atlas.sLOC(枕外侧皮质,左上分区)
这里有一个方法可以做到这一点。我复制了您的示例文本文件内容并保存在一个文件中。也许有更有效的方法可以做到这一点。但这应该行得通 你应该得到这样一份清单:
[{'Cluster (x,y,z)': '-44 -58 +3', 'size': ' 361', 'size p-FWE': '0.049000', 'size p-FDR': '0.030607', 'size p-unc': '0.000068', 'mass': ' 3797.10', 'mass p-FWE':
'0.058000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000081'}, {'Cluster (x,y,z)': '-38 -84 +1', 'size': ' 344', 'size p-FWE': '0.057000', 'size p-FDR': '0.030607', 'size p-unc': '0.000079', 'mass': ' 3386.52', 'mass p-FWE': '0.071000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000107'}, {'Cluster (x,y,z)': '-42 -06 +3',
'size': ' 259', 'size p-FWE': '0.108000', 'size p-FDR': '0.045083', 'size p-unc': '0.000175', 'mass': ' 3072.19', 'mass p-FWE': '0.091000', 'mass p-FDR': '0.036292', 'mass p-unc': '0.000141'}]
这是更新后的代码
headers=['Cluster (x,y,z)','size','size p-FWE','size p-FDR','size p-unc','mass','mass p-FWE','mass p-FDR','mass p-unc']
data=[]
with open("katerlo.txt","r") as txtfile:
data=txtfile.readlines()
final_data=[]
for i in range(len(data)):
if(i==0):
continue #we skip the first line
header_index=0
line=data[i]
#i am using a crude way to do it, since I can see that your data is in a fixed range
#also, I will output a json/list-dict instead of a table, I believe you can take it from there
if(len(line)<=2): #i am assuming your main interest is the first part, so I am not going to parse the bottom part
print(len(line))
break
first_col_data=line[:10] #extracting the odd first column
cur_json={}
cur_json[headers[header_index]]=first_col_data
header_index=header_index+1
line=line[24:] #i am skipping the large spacing which looks like completely defined
remaining_col_arr=line.split(" ") #the spacing defined in your text file as the delimiter
for item in remaining_col_arr:
item=item.replace('\n','') #replace endline character if it appears. There are much more efficient ways to do this though
cur_json[headers[header_index]]=item #extracting the remaining column data
header_index=header_index+1
final_data.append(cur_json)
print(final_data)
headers=['Cluster(x,y,z)'、'size'、'size p-FWE'、'size p-FDR'、'size p-unc'、'mass'、'mass p-FWE'、'mass p-FDR'、'mass p-unc']
数据=[]
打开(“katerlo.txt”、“r”)作为txt文件:
data=txtfile.readlines()
最终数据=[]
对于范围内的i(len(数据)):
如果(i==0):
继续,我们跳过第一行
标题索引=0
行=数据[i]
#我使用的是一种粗略的方法,因为我可以看到您的数据在一个固定的范围内
#另外,我将输出一个json/list dict,而不是一个表,我相信您可以从那里获得它
if(len(行)你的表格太难识别。你能以可读的方式编辑它吗?我已经做了更改。它看起来更好吗?在我的计算机上,表格显示正确谢谢你非常感谢这个脚本。当我运行它时,我收到一条错误消息替换预期至少2个参数,得到1错误来自此行item=item.replace('\n')#如果出现尾行字符,则替换它。有更有效的方法来执行此操作,尽管我的缺点是,应该是item=item.replace('\n','')