如何用python替换类似sed的文本?

如何用python替换类似sed的文本?,python,regex,linux,Python,Regex,Linux,我想启用此文件中的所有apt存储库 cat /etc/apt/sources.list ## Note, this file is written by cloud-init on first boot of an instance ## modifications made her

我想启用此文件中的所有apt存储库

cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance                                                                                                            
## modifications made here will not survive a re-bundle.                                                                                                                            
## if you wish to make changes you can:                                                                                                                                             
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg                                                                                                                
##     or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d                                                                                                                                       
#                                                                                                                                                                                   

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to                                                                                                           
# newer versions of the distribution.                                                                                                                                               
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                                   
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                               

## Major bug fix updates produced after the final release of the                                                                                                                    
## distribution.                                                                                                                                                                    
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                           
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                       

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu                                                                                                         
## team. Also, please note that software in universe WILL NOT receive any                                                                                                           
## review or updates from the Ubuntu security team.                                                                                                                                 
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                               
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                           
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner

deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse

对于sed,这是一个简单的
sed-i's/^#deb/deb/'/etc/apt/sources。列出做这件事最优雅(“pythonic”)的方法是什么?

不确定优雅,但这至少应该是非常可读的。对于sources.list,在阅读之前阅读所有行是可以的,对于更大的内容,您可能希望在循环时“就地”更改

#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
    # Read all the lines
    lines = sources_file.readlines()

    # Rewind and truncate
    sources_file.seek(0)
    sources_file.truncate()

    # Loop through the lines, adding them back to the file.
    for line in lines:
        if line.startswith("# deb"):
            sources_file.write(line[2:])
        else:
            sources_file.write(line)

编辑:将
-语句一起使用,以更好地处理文件。也忘记了在截断之前倒带。

您可以这样做:

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))
print(re.sub(pattern, template, text).rstrip("\n"))
with语句确保文件正确关闭,在
“w”
模式下重新打开文件会在写入之前清空文件。sub(pattern,replace,string)相当于sed/perl中的s/pattern/replace/


编辑:修复了示例中的语法

您可以执行以下操作:

p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()

或者(imho,从组织的角度来看,这更好)您可以将您的
源文件拆分为多个部分(一个条目/一个存储库),并将它们放在
/etc/apt/sources.list.d/

下。这是一种完全不同的方法,我不想编辑我的其他答案。 嵌套了
,因为我不使用3.1(其中
和A()作为A,B()作为B:
工作)

更改sources.list可能有点过火,但我想在以后的搜索中发布它

#!/usr/bin/env python
from shutil   import move
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(delete=False) as tmp_sources:
    with open("sources.list") as sources_file:
        for line in sources_file:
            if line.startswith("# deb"):
                tmp_sources.write(line[2:])
            else:
                tmp_sources.write(line)

move(tmp_sources.name, sources_file.name)
这应确保读取文件的其他人没有种族条件。 哦,我更喜欢str.startswith(…),如果你不需要regexp的话。

massedit.py()为你搭建了脚手架,只留下regex来编写。它仍处于测试阶段,但我们正在寻求反馈

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list
将以差异格式显示差异(之前/之后)

添加-w选项以将更改写入原始文件:

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list
或者,您现在可以使用api:

>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)

下面是一个单模块Python对
perl-p
的替换:

# Provide compatibility with `perl -p`

# Usage:
#
#     python -mloop_over_stdin_lines '<program>'

# In, `<program>`, use the variable `line` to read and change the current line.

# Example:
#
#         python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'

# From the perlrun documentation:
#
#        -p   causes Perl to assume the following loop around your
#             program, which makes it iterate over filename arguments
#             somewhat like sed:
# 
#               LINE:
#                 while (<>) {
#                     ...             # your program goes here
#                 } continue {
#                     print or die "-p destination: $!\n";
#                 }
# 
#             If a file named by an argument cannot be opened for some
#             reason, Perl warns you about it, and moves on to the next
#             file. Note that the lines are printed automatically. An
#             error occurring during printing is treated as fatal. To
#             suppress printing use the -n switch. A -p overrides a -n
#             switch.
# 
#             "BEGIN" and "END" blocks may be used to capture control
#             before or after the implicit loop, just as in awk.
# 

import re
import sys

for line in sys.stdin:
    exec(sys.argv[1], globals(), locals())
    try:
        print line,
    except:
        sys.exit('-p destination: $!\n')
#提供与'perl-p'的兼容性`
#用法:
#
#python-mloop\u over\u stdin\u行“”
#在“行”中,使用变量“line”读取并更改当前行。
#例如:
#
#python-mloop\u over\u stdin\u line'line=re.sub(“模式”,“替换”,line)”
#从perlrun文档中:
#
#-p使Perl在您的
#程序,使其在文件名参数上迭代
#有点像sed:
# 
#行:
#而(){
#…#您的程序在这里运行
#}继续{
#打印或冲模“-p目标:$!\n”;
#                 }
# 
#如果由参数命名的文件无法为某些文件打开
#因此,Perl警告您这一点,并继续下一步
#文件。请注意,这些行是自动打印的。一
#打印期间发生的错误被视为致命错误。到
#使用-n开关抑制打印。A-p覆盖A-n
#开关。
# 
#“开始”和“结束”块可用于捕获控制
#在隐式循环之前或之后,就像在awk中一样。
# 
进口稀土
导入系统
对于sys.stdin中的行:
exec(sys.argv[1],globals(),locals())
尝试:
打印行,
除:
系统退出('-p目标:$!\n')

如果您正在使用Python3,以下模块将帮助您:

将模块文件放入Python3模块路径中,然后:

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)
导入pysed
pysed.replace(,)
pysed.rmlinematch(,)
pysed.rmlinenumber(,)
试试:

pysed-r'#deb''deb'/etc/apt/sources.list

如果您确实希望在不安装新Python模块的情况下使用
sed
命令,只需执行以下操作:

import subprocess
subprocess.call("sed command")

在纯Python中编写一个自主开发的
sed
替代品,而不使用任何外部命令或附加依赖项,这是一项充满崇高地雷的崇高任务。谁会想到呢

尽管如此,这是可行的。这也是可取的。大家,我们都去过那里:“我需要咀嚼一些纯文本文件,但我只有Python、两条塑料鞋带和一罐发霉的地堡级马拉斯基诺樱桃。救命。”

在这个答案中,我们提供了一个同类最佳的解决方案,将先前答案中令人敬畏的部分拼凑在一起,而没有所有令人不快的部分。正如plundra所指出的,David Miller以非原子方式写入所需的文件,因此会邀请竞争条件(例如,来自试图同时读取该文件的其他线程和/或进程)。那太糟糕了。Plundra解决了这一问题,同时引入了更多问题——包括大量致命的编码错误、严重的安全漏洞(无法保留原始文件的权限和其他元数据),以及过早优化,用低级字符索引替换正则表达式。这也很糟糕

了不起,团结起来

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            for line in src_file:
                tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')

我希望能够找到并替换文本,但也能在插入的内容中包含匹配的组。我写这篇短文就是为了做到这一点:

其关键部分如下所示:

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))
print(re.sub(pattern, template, text).rstrip("\n"))
下面是一个如何工作的示例:

# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"

# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"

# The text to operate on
text = "cat 976 is my favorite"
使用此函数调用上述函数将产生:

turtle 976 is my favorite
有一个很好的答案,但是他的答案只适用于多行正则表达式。多行正则表达式很少使用,但有时很方便

这里是对他的sed_in place函数的一个改进,允许它在需要时使用多行正则表达式

警告:在多行模式下,它将在中读取整个文件,然后执行正则表达式替换,因此您只希望在小型ish文件上使用此模式-在多行模式下运行时,不要尝试在千兆字节大小的文件上运行此模式

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl, multiline = False):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    re_flags = 0
    if multiline:
        re_flags = re.M

    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern, re_flags)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            if multiline:
                content = src_file.read()
                tmp_file.write(pattern_compiled.sub(repl, content))
            else:
                for line in src_file:
                    tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\2jdoe@example.com', multiline=True)
如果我想要像sed这样的东西,那么我通常使用t调用
sed
本身
key1=value_tobe_replaced1
key2=value_tobe_replaced1
.     .
.     .
key1000=value_tobe_replaced1000
time costs: 0:00:42.533879
time costs: 0:00:00.348458