Python 如何迭代文件并替换文本_Python_Python 2.6

Python 如何迭代文件并替换文本

python

Python 如何迭代文件并替换文本,python,python-2.6,Python,Python 2.6,我是python初学者：如何迭代一个目录中的csv文件并替换字符串，例如 ww into vv .. into -- 所以，我不想将包含ww的行替换为vv，只想替换此行中的字符串。我试过类似的东西 #!/Python26/ # -*- coding: utf-8 -*- import os, sys for f in os.listdir(path): lines = f.readlines() 但如何进行呢 import os import csv for filename

我是python初学者：如何迭代一个目录中的csv文件并替换字符串，例如

ww into vv
.. into --

所以，我不想将包含ww的行替换为vv，只想替换此行中的字符串。我试过类似的东西

#!/Python26/
# -*- coding: utf-8 -*-

import os, sys
for f in os.listdir(path):
    lines = f.readlines()

但如何进行呢

import os
import csv

for filename in os.listdir(path):
    with open(os.path.join(path, filename), 'r') as f:
        for row in csv.reader(f):
            cells = [ cell.replace('www', 'vvv').replace('..', '--')
                      for cell in row ]
            # now you have a list of cells within one row
            # with all strings modified.

编辑：是让您学习/练习Python，还是只需要完成工作？在后一种情况下，使用

sed

程序：

sed -i 's/www/vvv/g' yourPath/*csv
sed -i 's/\.\./,,/g' yourPath/*csv

当您想用相同长度的字符串替换字符串时，可以就地进行替换，也就是说，只重写必须替换的位，而不必记录新修改的整个文件

因此，使用正则表达式，这很容易做到。该文件是CSV文件这一事实在该方法中绝对不重要：

from os import listdir
from os.path import join
import re
pat = re.compile('ww|\.\.')
dicrepl = {'ww':'vv' , '..':'--'}

for filename in listdir(path):
    with open(join(path,filename),'rb+') as f:
        ch = f.read()
        f.seek(0,0)
        pos = 0
        for match in pat.finditer(ch):
            f.seek(match.start()-pos, 1)
            f.write(dicrepl[match.group()])
            pos = match.end()

在二进制模式下进行这样的处理是绝对必要的：在“rb+”模式下是“b”

文件是在“r+”模式下打开的，这一事实允许在其中任何所需的位置进行读取和写入（如果文件是在“a”模式下打开的，我们只能在文件末尾进行写入）

但是，如果文件太大，以至于ch对象将消耗太多内存，则应该对其进行修改

如果替换字符串的长度与原始字符串的长度不同，则必须记录一个新的文件并进行修改。（若替换字符串的长度总是小于替换字符串的长度，这是一种特殊情况，并且仍然可以在不记录新文件的情况下进行处理。在大文件上可能会很有趣）

做f.seek（match.start（）-pos，1）而不是f.seek（match.start（），0）的有趣之处在于它将指针从位置pos移动到位置match.start（），而不必将指针从位置0移动到match.start（）然后每次从0到匹配.start（）

相反，使用f.seek（match.start（），0）时，必须首先将指针移回位置0（文件的开头），然后在计算match.start（）时向前移动，以在正确位置停止的字符数match.start（），因为seek（…，0）表示从文件的开始处获得位置，而查找（…，1）表示从当前位置进行移动。编辑：
如果要仅替换孤立的“ww”字符串，而不替换较长字符串“WWWW”中的“ww”块，则必须使用正则表达式

pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)')

pat=re.compile（”（？replace（）无需复杂的字符串操作
编辑：
我忘记了f.read（）之后的f.seek（0,0）指令。此指令是将文件指针移回文件开头所必需的，因为在读取过程中指针一直移动到末尾
我已经更正了代码，现在它可以工作了
下面是一个代码，用于跟踪正在处理的内容：
from os import listdir
from os.path import join
import re
pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)')
dicrepl = {'ww':'vv' , '..':'ZZ'}

path = ...................................

with open(path,'rb+') as f:
    print "file has just been opened, file's pointer is at position ",f.tell()
    print '- reading of the file : ch = f.read()'
    ch = f.read()
    print "file has just been read"+\
          "\nfile's pointer is now at position ",f.tell(),' , the end of the file'
    print "- file's pointer is moved back to the beginning of the file : f.seek(0,0)"
    f.seek(0,0)
    print "file's pointer is now again at position ",f.tell()
    pos = 0
    print '\n- process of replacrement is now launched :'
    for match in pat.finditer(ch):
        print
        print 'is at position ',f.tell()
        print 'group ',match.group(),' detected on span ',match.span()
        f.seek(match.start()-pos, 1)
        print 'pointer having been moved on position ',f.tell()
        f.write(dicrepl[match.group()])
        print 'detected group have been replaced with ',dicrepl[match.group()]
        print 'now at position ',f.tell()
        pos = match.end()

从操作系统导入listdir
从os.path导入联接
进口稀土
pat=re.compile（'）（？有关替换字符串的信息，请参见其他答案。我想添加关于迭代文件的更多信息，这是问题的第一部分
如果要在一个目录和所有子目录中递归，请使用。os.listdir（）
不递归，或在它生成的文件名中包含目录名。使用os.path.join（）
以形成更完整的路径名。
您想用其他行替换某些行吗？替换行是否与替换行的长度相同？如果不是，替换会导致许多问题，因为我们无法将位分开以在一个位置插入一块位：位是物理元素。其次，是否要进行替换在一个或多个文件中？如果在os.listdir（路径）中为f编写，f将依次是曲目路径中的每个文件。你想让行从一个文件转到另一个文件吗？替换行从何处来？等等…等等…你的问题必须更准确我想在几个文件上进行这些替换。只是在一个文件中进行字符串转换。对不起，我没有对第一行进行足够的注意是的。事实上它是非常清楚的。谢谢！我不知怎么地得到了一个错误：IOError:[Errno 2]没有这样的文件或目录：'file1.csv'。知道是什么原因造成的吗。@atricapilla-对不起，我已经编辑了它，现在它应该可以与path+filename连接。谢谢，这运行正常，但似乎无法进行字符串替换？@atricapilla-好的，您必须将这些数据写回。但是请参阅我关于sed
的评论（…）
-这可能不是您想要访问字典的方式。