Python 由argparse读取时打开gzip文件_Python_Gzip_Argparse_Vcf Variant Call Format

Python 由argparse读取时打开gzip文件

python

Python 由argparse读取时打开gzip文件,python,gzip,argparse,vcf-variant-call-format,Python,Gzip,Argparse,Vcf Variant Call Format,我可以按如下方式轻松打开gzip文件： import gzip import sys file1 = gzip.open(sys.argv[1], 'rb') with gzip.open(args.vcf, 'rb') as file1: for line in file1: print(line) if line.startswith("#CHROM"): print(line) 我尝试使用argpars

我可以按如下方式轻松打开gzip文件：

import gzip
import sys
file1 = gzip.open(sys.argv[1], 'rb')

with gzip.open(args.vcf, 'rb') as file1:
    for line in file1:
        print(line)
        if line.startswith("#CHROM"):
            print(line)

我尝试使用argparse：

import argparse
import gzip
parser = argparse.ArgumentParser()
parser.add_argument('vcf', help='gzipped VCF')
args = parser.parse.args()

当我试图读取我的gzip文件时，如下所示：

import gzip
import sys
file1 = gzip.open(sys.argv[1], 'rb')

with gzip.open(args.vcf, 'rb') as file1:
    for line in file1:
        print(line)
        if line.startswith("#CHROM"):
            print(line)

我得到了以下错误：

TypeError: startswith first arg must be bytes or a tuple of bytes, not str

因为我的行看起来像是在行的开头有一个

：

b'##fileformat=VCFv4.2\n'

我可以通过使用io.TextIOWrapper来解决此问题：

import io
with io.TextIOWrapper(gzip.open(args.vcf, 'r')) as vcf:
    for line in vcf:
        print(line)

为什么第一种方法不起作用

谢谢大家!

在第一种情况下，您使用的是

rb

，这意味着读取+二进制模式。因此，将

line.startswith（“#CHROM”）

更改为

line.startswith（b“#CHROM”）

将解决此问题。如前所述，错误消息的原因是字符串与字节进行比较。如果尝试

b'hello'。startswith（“h”）

将得到相同的错误消息。但是，将其更改为

b'hello'。startswith（b“h”）

将返回

True

。这与

argparse

有什么关系？通过

sys.argv

或

argparse

获得

vcf

的事实对打开和读取文件的方式没有影响（前提是两者获得相同的文件名）。