使用python替换特定行中的字符串_Python_Replace

使用python替换特定行中的字符串

python replace

使用python替换特定行中的字符串,python,replace,Python,Replace,我正在编写一个python脚本，用特定扩展名（.seq）替换目录中每个文本文件中的字符串。替换的字符串只能来自每个文件的第二行，并且输出是一个新的子目录（称为clean），其文件名与原始文件相同，但后缀为*.clean。输出文件包含与原始文件完全相同的文本，但替换了字符串。我需要将所有这些字符串：“K”、“Y”、“W”、“M”、“R”、“S”替换为“N” 这就是我在谷歌搜索后得出的结论。这是非常混乱的（编程的第二周），它停止复制文件到干净的目录，而不替换任何内容。我真的很感激任何帮助谢谢你 i

我正在编写一个python脚本，用特定扩展名（.seq）替换目录中每个文本文件中的字符串。替换的字符串只能来自每个文件的第二行，并且输出是一个新的子目录（称为clean），其文件名与原始文件相同，但后缀为*.clean。输出文件包含与原始文件完全相同的文本，但替换了字符串。我需要将所有这些字符串：“K”、“Y”、“W”、“M”、“R”、“S”替换为“N”

这就是我在谷歌搜索后得出的结论。这是非常混乱的（编程的第二周），它停止复制文件到干净的目录，而不替换任何内容。我真的很感激任何帮助

谢谢你

import os, shutil

os.mkdir('clean')

for file in os.listdir(os.getcwd()):
    if file.find('.seq') != -1:
        shutil.copy(file, 'clean')

os.chdir('clean')

for subdir, dirs, files in os.walk(os.getcwd()):
    for file in files:
        f = open(file, 'r')
        for line in f.read():
            if line.__contains__('>'): #indicator for the first line. the first line always starts with '>'. It's a FASTA file, if you've worked with dna/protein before.
                pass
            else:
                line.replace('M', 'N')
                line.replace('K', 'N')
                line.replace('Y', 'N')
                line.replace('W', 'N')
                line.replace('R', 'N')
                line.replace('S', 'N')

line.replace不是一个mutator，它保持原始字符串不变，并返回一个新字符串，其中包含替换内容。您需要将代码更改为

line=line。替换（'R'，'N'）

，等等

我认为您还需要在else子句的末尾添加一个

break

语句，这样您就不会遍历整个文件，而是在处理完第2行后停止

最后，您需要实际写出包含更改的文件。到目前为止，您只是读取文件并更新程序变量“line”中的行。您还需要实际创建一个输出文件，将修改的行写入其中。

您需要将替换结果分配回“line”变量

您还可以使用模块fileinput进行就地编辑

import os, shutil,fileinput
if not os.path.exists('clean'):
    os.mkdir('clean')

for file in os.listdir("."):
    if file.endswith(".seq"):
        shutil.copy(file, 'clean')

os.chdir('clean')

for subdir, dirs, files in os.walk("."):
    for file in files:
        f = fileinput.FileInput(file,inplace=0)
        for n,line in enumerate(f):
            if line.lstrip().startswith('>'):
                pass
            elif n==1: #replace 2nd line
                for repl in ["M","K","Y","W","R","S"]:
                    line=line.replace(ch, 'N')
            print line.rstrip()
        f.close()

将inplace=0更改为inplace=1，以便就地编辑文件。

您应该替换

line。将（'M'，'N'）

替换为

line=line。替换（'M'，'N'）

。replace返回替换了相关子字符串的原始字符串的副本

更好的方法（IMO）是使用可再生能源

import re

line="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
line=re.sub("K|Y|W|M|R|S",'N',line)
print line

以下是一些一般提示：

不要使用

find

检查文件扩展名（例如，这也将匹配“

file1.seqdata.xls

”）。至少使用

file.endswith（'seq'）

，或者更好地使用

os.path.splitext（文件）[1]

事实上，不要完全那样做。这就是你想要的：

import glob
seq_files = glob.glob("*.seq")

不要复制文件，只使用一个循环更容易：

for filename in seq_files:
    in_file = open(filename)
    out_file = open(os.path.join("clean", filename), "w")
    # now read lines from in_file and write lines to out_file

不要使用

行。uuu包含（'>'）

。你的意思是

if '>' in line:

（将在内部调用

\uuuuuu包含

）。但实际上，你想知道这行是否以““>”开头，而不是如果行中某个地方有一个“>”开头，不管它是否在开头。因此，更好的方法是：
if line.startswith(">"):

我不熟悉你的文件类型；如果“>”
检查实际上只是为了确定第一行，那么有更好的方法可以做到这一点


您不需要if
块（您只需传递
）。写起来更干净
if not something:
    do_things()
other_stuff()

而不是
if something:
    pass
else:
    do_things()
other_stuff()


享受学习Python的乐趣
 一些注意事项：
string.replace
和re.sub
未就位，因此应将返回值赋回变量
glob.glob
更适合在目录中查找与定义模式匹配的文件
也许您应该在创建目录之前检查它是否已经存在（我只是假设，这可能不是您想要的行为）
with
语句负责以安全的方式关闭文件。如果您不想使用它，您必须使用尝试
最后
在您的示例中，您忘记将sufix放置在何处（*.clean
；）
如果您没有实际编写文件，可以像我在示例中所做的那样，或者使用fileinput
模块（直到今天我才知道）
下面是我的例子：
import re
import os
import glob

source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]

# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

for source_file in source_files:
   target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
   with open(source_file,'r') as sfile:
      with open(target_file,'w') as tfile:
         lines = sfile.readlines()
         # do the replacement in the second line.
         # (remember that arrays are zero indexed)
         lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
         tfile.writelines(lines)

print "DONE"

希望有帮助。您也可以先编译re。
import re
import os
import glob

source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]

# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

for source_file in source_files:
   target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
   with open(source_file,'r') as sfile:
      with open(target_file,'w') as tfile:
         lines = sfile.readlines()
         # do the replacement in the second line.
         # (remember that arrays are zero indexed)
         lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
         tfile.writelines(lines)

print "DONE"