在python中，如何删除以相同字符（但随机）开头的行？_Python_Bioinformatics_Matching_Dna Sequence

在python中，如何删除以相同字符（但随机）开头的行？

python

在python中，如何删除以相同字符（但随机）开头的行？,python,bioinformatics,matching,dna-sequence,Python,Bioinformatics,Matching,Dna Sequence,我试图删除文件中以相同5个字符开头的行，但是，前5个字符是随机的（我不知道它们将是什么）我有一个代码，它读取文件第一行的最后5个字符，并将它们与文件中具有相同5个字符的随机行的前5个字符进行匹配。问题是，当有两个或多个匹配项具有相同的前5个字符时，代码会出错。我需要的东西，读取文件中的所有行，并删除其中一行有相同的前5个字符示例（问题）：从文件中取出一个后，我需要的结果是： CCTGGATGGCTTATATAAGAT***GTTAT*** ***GTTAT***ATAATATACCACC

我试图删除文件中以相同5个字符开头的行，但是，前5个字符是随机的（我不知道它们将是什么）

我有一个代码，它读取文件第一行的最后5个字符，并将它们与文件中具有相同5个字符的随机行的前5个字符进行匹配。问题是，当有两个或多个匹配项具有相同的前5个字符时，代码会出错。我需要的东西，读取文件中的所有行，并删除其中一行有相同的前5个字符

示例（问题）：

从文件中取出一个后，我需要的结果是：

CCTGGATGGCTTATATAAGAT***GTTAT***

***GTTAT***ATAATATACCACCGGGCTGCTT

（没有第三行）

如果您能解释一下我如何用文字来表达，我将不胜感激。

您可以这样做，例如：

FILE_NAME = "data.txt"                       # the name of the file to read in
NR_MATCHING_CHARS = 5                        # the number of characters that need to match

lines = set()                                # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF:            # open the file
    for line in inF:                         # for every line
        line = line.strip()                  # that is
        if line == "": continue              # not empty
        beginOfSequence = line[:NR_MATCHING_CHARS]
        if not (beginOfSequence in lines):   # and the beginning of this line was not printed yet
            print(line)                      # print the line
            lines.add(beginOfSequence)       # remember that the beginning of the line

欢迎来到StackOverflow。请按照您创建此帐户时的建议，阅读并遵循帮助文档中的发布指南，在这里申请。StackOverflow不是设计、编码、研究或教程资源。然而，若你们遵循你们在网上找到的任何资源，进行诚实的编码尝试，并遇到问题，你们会有一个很好的例子来发布。您发布的问题似乎根本不包括解决问题的任何尝试。StackOverflow希望您首先尝试解决自己的问题，因为您的尝试有助于我们更好地了解您的需求。请编辑问题以显示您已尝试过的内容，以便说明您在某个问题中遇到的具体问题。有关更多信息，请参阅并获取。向我们展示迄今为止您编写的代码，以便我们了解如何改进

FILE_NAME = "data.txt"                       # the name of the file to read in
NR_MATCHING_CHARS = 5                        # the number of characters that need to match

lines = set()                                # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF:            # open the file
    for line in inF:                         # for every line
        line = line.strip()                  # that is
        if line == "": continue              # not empty
        beginOfSequence = line[:NR_MATCHING_CHARS]
        if not (beginOfSequence in lines):   # and the beginning of this line was not printed yet
            print(line)                      # print the line
            lines.add(beginOfSequence)       # remember that the beginning of the line