Python 跳过csv.Dictreader中不同类型的注释行_Python_Csv_Dictionary

Python 跳过csv.Dictreader中不同类型的注释行

python csv dictionary

Python 跳过csv.Dictreader中不同类型的注释行,python,csv,dictionary,Python,Csv,Dictionary,我有几个用制表符分隔的文件，我想用csvDictreader读入dicts。每个文件都包含几个注释行，在实际数据开始之前以“#”或“\t”开头。注释行的数量因文件而异。我一直在尝试中概述的方法，但似乎无法使其工作这是我目前的代码： def load_database_snps(inputFile): '''This function takes a txt tab delimited input file (in house database) and returns a list o

我有几个用制表符分隔的文件，我想用csvDictreader读入dicts。每个文件都包含几个注释行，在实际数据开始之前以“#”或“\t”开头。注释行的数量因文件而异。我一直在尝试中概述的方法，但似乎无法使其工作

这是我目前的代码：

def load_database_snps(inputFile):
    '''This function takes a txt tab delimited input file (in house database) and returns a list of dictionaries for each variant'''
    idStore = [] #empty list for storing variant records                                                                                                                                                                         
    with open(inputFile, 'r+') as varin:
        idStoreDictgroup = csv.DictReader((row for row in  varin if row.startswith('hr', 1, 2)),delimiter='\t') #create a generator; dictionary per snp (row) in the file                                                        
        idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names                                                                                                
        print(type(idStoreDictgroup))
        for d in idStoreDictgroup: #iterate over dictionaries in varin_dictgroup                                                                                                                                                 
            print(d)
            idStore.append(d) #attach to var_list                                                                                                                                                                               
    return idStore

以下是输入文件的示例：

## SM=Sample,AD=Total Allele Depth, DP=Total Depth
## het;;; and homo;;; are breakdowns of variant read counts per sample - chr1:10002921 T>G AD=34 het:4;11;7;12 (sum=34)


        Hetereozygous                                       Homozygous                                      
    Chr     Start      End            ref           |A|     |C|     |G|     |T|     HetCount        |A|     |C|     |G|     |T|     HomCount        TotalCount      SampleCount
    chr1    10001102        10001102        T       0       0       SM=1;AD=22;DP=38        0       1       0       0       0       0       0       1       138     het:22; homo:-  
    chr1    10002921        10002921        T       0       0       SM=4;AD=34;DP=63        0       4       0       0       0       0       0       4       138     het:4;11;7;12;  homo:-

我想读的所有行都以'Chr'或'Chr'开头。我认为它不起作用，因为我需要使用生成器对它进行迭代以重新格式化字段名，在将行读入字典之前，生成器会将其耗尽

我收到的错误消息是：

回溯（最近一次呼叫最后一次）：
文件“snp_freq_V1-1_export.py”，第99行，在
snp_检查_包装（inputargs.snpstocheck，inputargs.snp_数据库位置）
文件“snp_freq_V1-1_export.py”，第92行，在snp_check_包装中
snpDatabase=load_database_snps（databaseInputFile）#在snp_数据库（字典）中存储数据库变量
文件“snp_freq_V1-1_export.py”，第53行，在load_数据库_snps中
idStoreDictgroup.fieldnames=[field.strip（）用于idStoreDictgroup.fieldnames]中的字段#从字段名中去除空白
TypeError:“非类型”对象不可编辑

我尝试了与当前代码相反的操作，并明确排除了以“#”和“\t”开头的行。但这也不起作用，只给了我一本空白字典。

您应该能够跳过前面的所有行，直到带有

chr

的内容开始，例如：

import csv
from itertools import dropwhile

with open('somefile') as fin:
    start = dropwhile(lambda L: not L.lower().lstrip().startswith('chr'), fin)
    for row in csv.DictReader(start, delimiter='\t'):
        # do something

每个文件只有一个吗？如。。。上述注释/标题不会在每个文件中重复多次？是的，因此从示例文件中，我希望它使用Chr Start行。。。作为标题和所有后续行作为我的字典的值。这很好地避免了不需要的行。但是，我怎样才能去掉字典键上的空白呢？没错，我现在已经设法解决了。非常感谢您的帮助！：）

import csv
from itertools import dropwhile

with open('somefile') as fin:
    start = dropwhile(lambda L: not L.lower().lstrip().startswith('chr'), fin)
    for row in csv.DictReader(start, delimiter='\t'):
        # do something