Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/unity3d/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用set()和FastqGeneralIterator()从fastq文件中提取序列子集_Python_Parsing_For Loop_Conditional_Biopython - Fatal编程技术网

Python 使用set()和FastqGeneralIterator()从fastq文件中提取序列子集

Python 使用set()和FastqGeneralIterator()从fastq文件中提取序列子集,python,parsing,for-loop,conditional,biopython,Python,Parsing,For Loop,Conditional,Biopython,我有两个fastq文件,我只需要共享的fastq记录。但是,当我编写两个仅包含匹配记录的不同文件时,脚本失败。我正在使用设置为以优化内存使用。有人能帮我解决这个问题吗?代码如下: from Bio.SeqIO.QualityIO import FastqGeneralIterator infileR1= open('R1.fastq', 'r') infileR2= open('R2.fastq', 'r') output1= open('matchedR1.fastq', 'w') outp

我有两个fastq文件,我只需要共享的fastq记录。但是,当我编写两个仅包含匹配记录的不同文件时,脚本失败。我正在使用设置为以优化内存使用。有人能帮我解决这个问题吗?代码如下:

from Bio.SeqIO.QualityIO import FastqGeneralIterator

infileR1= open('R1.fastq', 'r')
infileR2= open('R2.fastq', 'r')
output1= open('matchedR1.fastq', 'w')
output2= open('matchedR2.fastq', 'w')

all_names1 = set()
for line in infileR1 :
    if line[0:11] == '@GWZHISEQ01':
        read_name = line.split()[0]
        all_names1.add(read_name)

all_names2 = set()
for line in infileR2 :
    if line[0:11] == '@GWZHISEQ01':
        read_name = line.split()[0]
        all_names2.add(read_name)

shared_names = set()
for item in all_names1:
    if item in all_names2:
        shared_names.add(item)

#printing out the files:

for title, seq, qual in FastqGeneralIterator(infileR1):
    if title in new:
        output1.write("%s\n%s\n+\n%s\n" % (title, seq, qual))

for title, seq, qual in FastqGeneralIterator(infileR2):
    if title in shared_names:
        output2.write("%s\n%s\n+\n%s\n" % (title, seq, qual))

infileR1.close() 
infileR2.close()
output1.close()
output2.close()

在不知道确切错误的情况下,您应该添加对它的描述,而不是说它失败了,我猜您正在使用一个已耗尽的处理程序

使用infier1=open'R1.fastq','r'打开处理程序 然后,使用infier1:中的for行读取文件以获取标题。 最后,将同一个处理程序传递给FastQGeneraliator,但指针位于文件的末尾,因此迭代器已经位于文件的末尾,并且不会产生任何结果。 您应该在最后一次循环之前使用infier1.seek0倒带文件,或者按照传递文件名的文档中的建议,更改代码以使用SeqIO包装器:

infileR1.close()

for record in SeqIO.parse("R1.fastq", "fastq"):
    # Do business here