python：为每个条目分别编号_Python_Sorting_File Io_Formatting

python：为每个条目分别编号

python sorting file-io formatting

python：为每个条目分别编号,python,sorting,file-io,formatting,Python,Sorting,File Io,Formatting,我有如下数据： car trans + 1,4,6,8 plane trans + 3,5,7,9,4,3 train trans - 2,4,6,7 bus trans - 1,3,4,5,6,7,8 这需要按以下格式组织。我基本上想从第四列中取“eventh”数字，然后把它放在第四列，如果它是“+”，或者如果它是“-”，放在第五列。如果它是“+”的话，我想在它的值上加1，然后把它放在第5列。如果它是“-”，我想减去1，然后把它放在第四列 car.1 t

我有如下数据：

car    trans  +  1,4,6,8
plane  trans  +  3,5,7,9,4,3
train  trans  -  2,4,6,7
bus    trans  -  1,3,4,5,6,7,8

这需要按以下格式组织。我基本上想从第四列中取“eventh”数字，然后把它放在第四列，如果它是“+”，或者如果它是“-”，放在第五列。如果它是“+”的话，我想在它的值上加1，然后把它放在第5列。如果它是“-”，我想减去1，然后把它放在第四列

car.1    trans  +  4  5
car.2    trans  +  8  9
plane.1  trans  +  5  6
plane.2  trans  +  9  10
plane.3  trans  +  3  4
train.1  trans  -  3  4
train.2  trans  -  6  7
bus.1    trans  -  2  3
bus.2    trans  -  4  5
bus.3    trans  -  6  7

下面是我现在拥有的代码。这提供了我想要的输出，但唯一的问题是第一列上的名称没有按我想要的顺序排列。（car.1，car.2）我知道我必须将它指向output.write（）行，但我不知道如何生成一个字符串，将原始数据中以逗号分隔的值对元素进行编号。请帮帮我

import sys
import string
infileName = sys.argv[1]
outfileName = sys.argv[2]

def getGenes(infile, outfile):

    infile = open(infileName,"r")
    outfile = open(outfileName, "w")

    while 1:
       line = infile.readline()
       if not line: break
       wrds = string.split(line)
       comma = string.split(wrds[3], ",")
       fivess = comma[1::2]


    if len(wrds) >= 2:
        name = wrds[0]
        chr = wrds[1]
        type = wrds[2]
        print(type)
    if type == "+":
        for jj in fivess:
            start = jj
            stop = string.atoi(jj)+1
            outfile.write('%s%s\t%s\t%s\t%s\t%s\n' %(name, , chr, type, start, stop))           
    elif type == "-":
        for jj in fivess:
            stop = jj
            start= string.atoi(jj)-1
            outfile.write('%s%s\t%s\t%s\t%s\t%s\n' %(name, ,chr, type, start, stop))   




getGenes(infileName, outfileName)

实际上，您不必在

output.write（）行执行此操作。理想情况下，您应该使用您的输入进行排序，这样您就可以先正确排序，然后再进行处理，而不必考虑顺序。以下是我编写的代码，将您的代码用作框架，但澄清/防错了一些事情：
import sys

infileName_s = sys.argv[1]
outfileName_s = sys.argv[2]

def getGenes(infileName, outfileName):

    infile = open(infileName,"r")
    outfile = open(outfileName, "w")

    x = infile.read()
    infile.close()   # make sure to close infile and outfile
    data = x.split('\n')
    alldata = []
    for line in data:
        alldata.append(line.split())
        alldata[-1][-1] = alldata[-1][-1].split(',')

    alldata = sorted(alldata) # sort

    mod_alldata = []

    for line in alldata: # create data structures
        for i in range(1, len(line[-1]), 2):
            if line[2] == '+':
                mod_alldata.append([line[0]+'.'+str(i/2+1), line[1], line[2], line[3][i], int(line[3][i])+1])
            else:
                mod_alldata.append([line[0]+'.'+str(i/2+1), line[1], line[2], int(line[3][i])-1, line[3][i]])

    for line in mod_alldata: # write to file
        outfile.write(line[0] + '\t' + line[1]+ '\t' + line[2] + '\t' + str(line[3]) + '\t' + str(line[4]) + '\n')
    outfile.close()

getGenes(infileName_s, outfileName_s)

注意事项：

始终关闭打开的文件

注意变量范围——在函数内部和外部使用的infileName
/infile
和outfileName
/outfile
都不同
使用步长为2的范围
（正如我在这里所做的那样：范围（1，len（第[-1行]），2）
）在迭代偶数索引时非常有用，并且在奇数/空列表的情况下也非常健壮
我使用sorted（）
按字母顺序排序，因为我不知道您希望它们如何排序。如果你想让他们有不同的排序，请在评论中告诉我

这是到指定文本文件的输出：
bus.1   trans   -   2   3
bus.2   trans   -   4   5
bus.3   trans   -   6   7
car.1   trans   +   4   5
car.2   trans   +   8   9
plane.1 trans   +   5   6
plane.2 trans   +   9   10
plane.3 trans   +   3   4
train.1 trans   -   3   4
train.2 trans   -   6   7

您是否考虑过将处理后的数据放入列表中
，然后使用list.sort（）
在将其写回之前给出所需的顺序？@jornsharpe我正打算提出完全相同的建议。如果默认的文本顺序排序不符合标准（乍一看似乎应该符合标准），则可以使用键
参数传入函数进行比较