当文件名在python中是变量时，如何更改它的一部分？_Python_Bash_Biopython

当文件名在python中是变量时，如何更改它的一部分？

python bash

当文件名在python中是变量时，如何更改它的一部分？,python,bash,biopython,Python,Bash,Biopython,我目前有一个python脚本，它将一个文件作为命令行参数，执行它需要执行的操作，然后输出附加了\u all\u ORF.fsa\u aa的文件。我想实际编辑文件名，而不是追加文件名，但我对变量感到困惑。当文件是一个变量时，我不确定如何才能真正做到这一点以下是命令行参数的示例： gL=genomeList.txt #Text file containing a list of genomes to loop through. for i in $(cat ${gL

我目前有一个python脚本，它将一个文件作为命令行参数，执行它需要执行的操作，然后输出附加了

\u all\u ORF.fsa\u aa

的文件。我想实际编辑文件名，而不是追加文件名，但我对变量感到困惑。当文件是一个变量时，我不确定如何才能真正做到这一点

以下是命令行参数的示例：

gL=genomeList.txt   #Text file containing a list of genomes to loop through.             

for i in $(cat ${gL}); do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${i}_getorf.fsa_aa ; 
    done

下面是一些python脚本（从_getorf.py中查找_all_ORF_）：

目前，oupt文件名为

Genome\u file\u getorf.fsa\u aa\u all\u ORF.fsa\u aa

。我想删除第一个

fsa\u aa

，以便输出如下所示：

Genome\u file\u getorf\u all\u ORF.fsa\u aa

。我该怎么做？我不知道如何编辑它

我已经看过了，但似乎无法编辑变量名，只需附加到它

谢谢

关于您的bash代码，您可能会发现下面的代码片段很有用，我发现它更具可读性，并且在迭代行时经常使用它

while read line; do
    #some other stuff ; 
    python ./find_all_ORF_from_getorf.py ${line}_getorf.fsa_aa ; 
done < genomeList.txt

在这一点上，你的填充将看起来像“基因组文件”\u getorf.fsa\u aa” 一个选项是通过“.”拆分此字符串并获取第一项

name = infile.split('.')[0]

如果您知道文件名中可能有几个“.”，比如“Myfile.out.old”，您只想去掉最后一个扩展名

name = infile.rsplit('.',1)[0]

第三个选项，如果您知道所有文件都以“.fsa_aa”结尾，您可以使用负索引对字符串进行切片。As“.fsa_aa”有7个字符：

name = input[:-7]

这三个选项基于python中字符串处理的字符串方法，请参阅

另一个选择是使用来自的路径，我建议您使用这个库。在这种情况下，您必须对代码进行一些其他小更改：

import re, sys
from pathlib import Path # <- Here

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = Path(sys.argv[1]) # <- Here
outfile = infile.stem + '_all_ORF.fsa_aa' # <- Here 
# And if you want to use outfile as a path I would suggest instead
# outfile = infile.parent.joinpath(infile.stem)

with open(outfile, "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')

导入re，系统

从pathlib导入路径#因此。。。要替换字符串的某些部分吗？那你应该查一下。谢谢你的帮助。我不知道在哪里可以看到这方面的进展，我一直被完全重命名的模块/方法所困扰。我不知道如何分割变量。你是对的，我知道文件将以

.fsa_aa

结尾，因此你建议使用

name=infle[：-7]

效果很好。我将查看pathlib库。我能问一下，为什么你会推荐这种方法而不是拆分/切片字符串？另外，感谢Bash技巧以及您关于替换“file_object.write”方法的print语句的建议。我将使用它们：）我建议检查它，以便将来编写代码。本质上是因为它允许更大的灵活性。有时，在试图覆盖某个文件之前，您需要检查该文件是否存在；有时，您需要生成某个文件夹结构，并以某种方式将生成的文件放置在这些文件夹中。pathlib属于标准python库，使其中一些任务变得更容易。

name = input[:-7]

outfile = f'{name}_all_ORF.fsa_aa' 
# if you wrote f'{variable}' you don't need the ".format()"
# On the other hand you can do '{}'.format(variable)
# or even '{variable}'.format(variable=SomeOtherVariable)

with open(outfile, "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')

import re, sys
from pathlib import Path # <- Here

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

infile = Path(sys.argv[1]) # <- Here
outfile = infile.stem + '_all_ORF.fsa_aa' # <- Here 
# And if you want to use outfile as a path I would suggest instead
# outfile = infile.parent.joinpath(infile.stem)

with open(outfile, "a") as file_object:
    for sequence in SeqIO.parse(infile, "fasta"):
       #do some stuff
       file_object.write(f'{sequence.description}_ORF_from_position_{h.start()},\n{sequence.seq[h_start:]}')