在Python中创建与两个列表中的项目数相同的文件数_Python

在Python中创建与两个列表中的项目数相同的文件数

python

在Python中创建与两个列表中的项目数相同的文件数,python,Python,考虑文件testbam.txt： /groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam /groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam /groups/cgsd/alexandre/gatk-workflows/src

考虑文件

testbam.txt

：

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai

以及文件

testbai.txt

：

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai

它们总是有相同的长度，我创建了一个函数来查找它：

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

n = file_len('/groups/cgsd/alexandre/python_code/src/testbai.txt')
print(n)
3

然后，我通过打开文件并进行一些操作创建了两个列表：

content = []
with open('/groups/cgsd/alexandre/python_code/src/testbam.txt') as bams:
    for line in bams:
        content.append(line.strip().split())

print(content)

content2 = []
with open('/groups/cgsd/alexandre/python_code/src/testbai.txt') as bais:
    for line in bais:
        content2.append(line.strip().split())

print(content2)

现在我有一个名为

mutec.json

的

json

类型文件，我想用列表中的项目替换某些部分：

{
    "Mutect2.gatk_docker": "broadinstitute/gatk:4.1.4.1",
    "Mutect2.intervals": "/groups/cgsd/alexandre/gatk-workflows/src/interval_list/Basic_Core_xGen_MSI_TERT_HPV_EBV_hg38.interval_list",
    "Mutect2.scatter_count": 30,
    "Mutect2.m2_extra_args": "--downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6",
    "Mutect2.filter_funcotations": true,
    "Mutect2.funco_reference_version": "hg38",
    "Mutect2.run_funcotator": true,
    "Mutect2.make_bamout": true,
    "Mutect2.funco_data_sources_tar_gz": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz",
    "Mutect2.funco_transcript_selection_list": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt",
  
    "Mutect2.ref_fasta": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta",
    "Mutect2.ref_fai": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta.fai",
    "Mutect2.ref_dict": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.dict",
    
    "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
    "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
  }

{
“Mutect2.gatk_docker”：“broadinstitute/gatk:4.1.4.1”，
“Mutect2.interval”：“/groups/cgsd/alexandre/gatk workflows/src/interval\u list/Basic\u Core\u xGen\u MSI\u TERT\u HPV\u EBV\u hg38.interval\u list”，
“静音2.分散计数”：30，
“Mutect2.m2_extra_args”：”--下采样步长20--每次对齐开始时的最大读取数6--每次对齐开始时的最大可疑读取数6“，
“Mutect2.filter_functions”：true，
“Mutect2.funco\u参考版本”：“hg38”，
“Mutect2.run_functator”：true，
“Mutect2.make_bamout”：真，
“Mutect2.funco_数据源_tar_gz”：“/groups/cgsd/alexandre/gatk workflows/Mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz”，
“Mutect2.funco_transcript_selection_list”：“/groups/cgsd/alexandre/gatk workflows/Mutect2/inputs/transcript list.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt”，
“Mutect2.ref_fasta”：“/groups/cgsd/alexandre/gatk workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly 38_chrpv.fasta”，
“Mutect2.ref_fai”：“/groups/cgsd/alexandre/gatk workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly 38_chrpv.fasta.fai”，
“Mutect2.ref_dict”：“/groups/cgsd/alexandre/gatk workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly 38_chrpv.dict”，
“Mutect2.tumor_的内容如下：”，
“Mutect2.tumor读取索引”：”，
}

请注意，本节：

   "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
   "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",

“Mutect2.tumor_读取”：“，
“Mutect2.tumor读取索引”：”，

和

应替换为列表中各自的项目，我想最后将每次修改的结果写入一个新文件

最终结果将是3个文件：

mutect1.json

，其中第一个项目来自

testbam.txt

，第一个项目来自

testbai.txt

，

mutect2.json

，第二个项目来自

testbai.txt

，第三个文件应用相同的推理

请注意，我写的符号

和

不一定是硬编码到文件中的，我写自己只是为了清楚我想替换什么。

首先，即使它与问题无关，您的一些代码也不是真正的Pythonic：

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

当您只需执行以下操作时，可以使用for循环遍历

枚举

：

def file_len(fname):
    with open(fname) as f:
        return len(f)

因为f是文件行上的迭代器

现在谈谈你的问题。您想用另外两个文件中的数据替换文件中的某些元素

在你最初的问题中，字符串用三个尖括号括起来

我会使用：

import re

rx = re.compile(r'<<<.*?>>>')        # how to identify what is to replace

with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
     open('.../mutect.json') as src:
    for i, reps in enumerate(zip(bams, bais), 1): # gets a pair of replacement strings at each step
        src.seek(0)                  # rewind src file
        with open(f'mutect{i}', 'w') as fdout:  # open the output files
            rep_index = 0            # will first use rep string from first file
            for line in src:
                if rx.search(line):  # if the string to replace there?
                    line = rx.sub(reps[rep_index], line)
                    rep_index = 1 - rep_index    # next time will use the other string
                fdout.write(line)

你的问题没有重点！！一个问题太多了。你可以问一件事，你在哪里遇到了问题/错误。为什么不呢？我尽了最大努力提供了所有细节，看看上面的第一个代码。它打开了一个for循环，它所做的就是

通过。那是什么？然后让我知道我应该从问题中删除什么please@AvenDesta：该函数不是真正的Python函数，但它返回预期结果。。。