
在python中,如何删除以相同字符(但随机)开头的行?,python,bioinformatics,matching,dna-sequence,Python,Bioinformatics,Matching,Dna Sequence,我试图删除文件中以相同5个字符开头的行,但是,前5个字符是随机的(我不知道它们将是什么) 我有一个代码,它读取文件第一行的最后5个字符,并将它们与文件中具有相同5个字符的随机行的前5个字符进行匹配。问题是,当有两个或多个匹配项具有相同的前5个字符时,代码会出错。我需要的东西,读取文件中的所有行,并删除其中一行有相同的前5个字符 示例(问题): 从文件中取出一个后,我需要的结果是: CCTGGATGGCTTATATAAGAT***GTTAT*** ***GTTAT***ATAATATACCACC









FILE_NAME = "data.txt"                       # the name of the file to read in
NR_MATCHING_CHARS = 5                        # the number of characters that need to match

lines = set()                                # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF:            # open the file
    for line in inF:                         # for every line
        line = line.strip()                  # that is
        if line == "": continue              # not empty
        beginOfSequence = line[:NR_MATCHING_CHARS]
        if not (beginOfSequence in lines):   # and the beginning of this line was not printed yet
            print(line)                      # print the line
            lines.add(beginOfSequence)       # remember that the beginning of the line

FILE_NAME = "data.txt"                       # the name of the file to read in
NR_MATCHING_CHARS = 5                        # the number of characters that need to match

lines = set()                                # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF:            # open the file
    for line in inF:                         # for every line
        line = line.strip()                  # that is
        if line == "": continue              # not empty
        beginOfSequence = line[:NR_MATCHING_CHARS]
        if not (beginOfSequence in lines):   # and the beginning of this line was not printed yet
            print(line)                      # print the line
            lines.add(beginOfSequence)       # remember that the beginning of the line