Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/fortran/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:从.seg文件中提取数据_Python_Data Extraction - Fatal编程技术网

Python:从.seg文件中提取数据

Python:从.seg文件中提取数据,python,data-extraction,Python,Data Extraction,我有一个.seg文件,我需要从中根据簇号提取第3列和第4列中的值,例如S0 ;; cluster S0 khatija-ankle 1 0 184 F S U S0 ;; cluster S1 khatija-ankle 1 407 291 F S U S1 khatija-ankle 1 790 473 F S U S1 khatija-ankle 1 1314 248 F S U S1 khatija-ankle 1 1663 187 F S U S1 以下是我目前的代码: file1

我有一个
.seg
文件,我需要从中根据簇号提取第3列和第4列中的值,例如
S0

;; cluster S0 
khatija-ankle 1 0 184 F S U S0
;; cluster S1 
khatija-ankle 1 407 291 F S U S1
khatija-ankle 1 790 473 F S U S1
khatija-ankle 1 1314 248 F S U S1
khatija-ankle 1 1663 187 F S U S1
以下是我目前的代码:

file1 = open('f1.seg', "w")
file2 = open('f2.seg', "w")

with open('ankle.seg','r') as f:
    for line in f:
        for word in line.split():
            if word == 'S0':
            file1.write(word)
        elif word == 'S1':
            file2.write(word) 
如何创建每个集群的文件并在其中写入第3列和第4列

问题:如何创建每个集群的文件并在其中写入第3列和第4列

如果word=='S0':,则要比较单列值
,请检查哪个集群id具有数据行的最后一列

例如:

# Create a list of column values
data = line.rstrip().split()

# Condition: last value in data == cluster id
if data[-1] == 'S0':
    # write to S0 file
    print("file1.write({})".format(data[2:4]))

elif data[-1] == 'S1':
    # write to S1 file
    print("file2.write({})".format(data[2:4]))
输出

file1.write(['S0'])
file1.write(['0', '184'])
file2.write(['S1'])
file2.write(['407', '291'])
file2.write(['790', '473'])
file2.write(['1314', '248'])
file2.write(['1663', '187'])
使用Python:3.4.2测试

问题:如何创建每个集群的文件并在其中写入第3列和第4列

如果word=='S0':
,则要比较单列值
,请检查哪个集群id具有数据行的最后一列

例如:

# Create a list of column values
data = line.rstrip().split()

# Condition: last value in data == cluster id
if data[-1] == 'S0':
    # write to S0 file
    print("file1.write({})".format(data[2:4]))

elif data[-1] == 'S1':
    # write to S1 file
    print("file2.write({})".format(data[2:4]))
输出

file1.write(['S0'])
file1.write(['0', '184'])
file2.write(['S1'])
file2.write(['407', '291'])
file2.write(['790', '473'])
file2.write(['1314', '248'])
file2.write(['1663', '187'])

使用Python:3.4.2进行测试

这当然可以在Python中完成,但它完美地说明了为什么awk非常适合对文本文件进行剪切:

#! /usr/bin/awk -f
/^;;/ {
      filename = $3 ".seg"
      next
}

{ print $3, $4 > filename }
输出:

$ tail *.seg
==> S0.seg <==
0 184

==> S1.seg <==
407 291
790 473
1314 248
1663 187
$tail*.seg

==>S0.seg S1.seg这当然可以在Python中完成,但它完美地说明了为什么awk非常适合在文本文件中进行剪切:

#! /usr/bin/awk -f
/^;;/ {
      filename = $3 ".seg"
      next
}

{ print $3, $4 > filename }
输出:

$ tail *.seg
==> S0.seg <==
0 184

==> S1.seg <==
407 291
790 473
1314 248
1663 187
$tail*.seg
==>S0.seg S1.seg