Python：从.seg文件中提取数据_Python_Data Extraction

Python：从.seg文件中提取数据

python

Python：从.seg文件中提取数据,python,data-extraction,Python,Data Extraction,我有一个.seg文件，我需要从中根据簇号提取第3列和第4列中的值，例如S0 ;; cluster S0 khatija-ankle 1 0 184 F S U S0 ;; cluster S1 khatija-ankle 1 407 291 F S U S1 khatija-ankle 1 790 473 F S U S1 khatija-ankle 1 1314 248 F S U S1 khatija-ankle 1 1663 187 F S U S1 以下是我目前的代码： file1

我有一个

.seg

文件，我需要从中根据簇号提取第3列和第4列中的值，例如

S0

;; cluster S0 
khatija-ankle 1 0 184 F S U S0
;; cluster S1 
khatija-ankle 1 407 291 F S U S1
khatija-ankle 1 790 473 F S U S1
khatija-ankle 1 1314 248 F S U S1
khatija-ankle 1 1663 187 F S U S1

以下是我目前的代码：

file1 = open('f1.seg', "w")
file2 = open('f2.seg', "w")

with open('ankle.seg','r') as f:
    for line in f:
        for word in line.split():
            if word == 'S0':
            file1.write(word)
        elif word == 'S1':
            file2.write(word)

如何创建每个集群的文件并在其中写入第3列和第4列

问题：如何创建每个集群的文件并在其中写入第3列和第4列

如果word=='S0'：，则要比较单列值

，请检查哪个集群id具有数据行的最后一列
例如：
# Create a list of column values
data = line.rstrip().split()

# Condition: last value in data == cluster id
if data[-1] == 'S0':
    # write to S0 file
    print("file1.write({})".format(data[2:4]))

elif data[-1] == 'S1':
    # write to S1 file
    print("file2.write({})".format(data[2:4]))

输出：
file1.write(['S0'])
file1.write(['0', '184'])
file2.write(['S1'])
file2.write(['407', '291'])
file2.write(['790', '473'])
file2.write(['1314', '248'])
file2.write(['1663', '187'])

使用Python:3.4.2测试
问题：如何创建每个集群的文件并在其中写入第3列和第4列
如果word=='S0'：

，则要比较单列值
，请检查哪个集群id具有数据行的最后一列例如： # Create a list of column values data = line.rstrip().split() # Condition: last value in data == cluster id if data[-1] == 'S0': # write to S0 file print("file1.write({})".format(data[2:4])) elif data[-1] == 'S1': # write to S1 file print("file2.write({})".format(data[2:4])) 输出： file1.write(['S0']) file1.write(['0', '184']) file2.write(['S1']) file2.write(['407', '291']) file2.write(['790', '473']) file2.write(['1314', '248']) file2.write(['1663', '187']) 使用Python:3.4.2进行测试这当然可以在Python中完成，但它完美地说明了为什么awk非常适合对文本文件进行剪切： #! /usr/bin/awk -f /^;;/ { filename = $3 ".seg" next } { print $3, $4 > filename } 输出： $ tail *.seg ==> S0.seg <== 0 184 ==> S1.seg <== 407 291 790 473 1314 248 1663 187 $tail*.seg ==>S0.seg S1.seg这当然可以在Python中完成，但它完美地说明了为什么awk非常适合在文本文件中进行剪切： #! /usr/bin/awk -f /^;;/ { filename = $3 ".seg" next } { print $3, $4 > filename } 输出： $ tail *.seg ==> S0.seg <== 0 184 ==> S1.seg <== 407 291 790 473 1314 248 1663 187 $tail*.seg ==>S0.seg S1.seg