Python 3中的正则表达式逐行替换_Python_Regex_Svg_Replace_Inline

Python 3中的正则表达式逐行替换

python regex svg replace

Python 3中的正则表达式逐行替换,python,regex,svg,replace,inline,Python,Regex,Svg,Replace,Inline,我已经编写了一些代码来尝试执行以下操作打开我之前在Python代码中检索到的SVG文件在文件中查找特定的正则表达式（r“（？：xlink:href\=\”（*）（*）（？：\？q=80\“/>））如果找到匹配项，则用特定字符串替换文本，例如。然后从匹配的url中检索JPG（参见上面指向regex101.com的链接）但是，这不起作用，会将文件完全清空（因此为0字节）我想我一定很快就能让它工作了，但还没有做到。如有任何指导，将不胜感激 pagenumber=1 directory

我已经编写了一些代码来尝试执行以下操作

打开我之前在Python代码中检索到的SVG文件
在文件中查找特定的正则表达式（r“（？：xlink:href\=\”（*）（*）（？：\？q=80\“/>））
如果找到匹配项，则用特定字符串替换文本，例如。
然后从匹配的url中检索JPG（参见上面指向regex101.com的链接）

但是，这不起作用，会将文件完全清空（因此为0字节）

我想我一定很快就能让它工作了，但还没有做到。如有任何指导，将不胜感激

pagenumber=1
directory_in_str='/home/somewhere/somedir/'
pathlist = Path(directory_in_str).glob('**/*.svg')
for path in pathlist:
#because path is object not string
  path_in_str = str(path)   
  print(path_in_str)

  with open(path_in_str, 'r+') as f:
    for line in f:
        myregex = r"(?:xlink:href\=\")(.*)(?:\?q=80\"\/\>)"
        result = myregex.search(line)

        if result:
            #If a match is found, replace the text in the line          
            origfullimgfilename = result
            formattedpagenumber = '{:0>3}'.format(pagenumber)
            replfullimgfilename='page-'+str(formattedpagenumber)+'-img1.jpg'
            line = re.sub(origfullimgfilename, replfullimgfilename, line.rstrip())  
            #Then retrieve the file! (origfullimgfilename)
            try:
                urllib.request.urlretrieve(origfullimgfilename+"?q=100", replfullimgfilename)
            except urllib.error.HTTPError as e:
                print("HTTP Error: "+str(e.code)+" - SVG URL: "+str(origfullimgfilename)+" - aborting\n")
                break
pagenumber += 1

lines=”“”
"""
进口稀土
换行符=[]
对于行中的行。拆分（'\n'）：
myregex=re.compile（r“（？：xlink:href\=\”（.*）（？：\？q=80\”\/\>））
结果=myregex.search（行）
如果结果为：
打印（结果）
#如果找到匹配项，请替换行中的文本
origfullimgfilename=result.group（1）
repllfullimgfilename='page-##-img1.jpg'
line=re.sub（origfullimgfilename，replfullimgfilename，line）
换行。追加（行）
打印（'\n'.连接（换行））

这不应该发生。发布整个程序和输入文件将有助于其他人检查它。Hi@ben哪部分不应该发生？一旦SVG文件位于目录中，上面的代码将作为独立代码运行。。。。我已经将它分离到一个单独的.py文件中，因为到目前为止一切正常。请检查

result

，使用我拥有的代码的修改版本：

@xdze2它应该是“看起来可以工作的，但我想在svg文件中“就地”执行它…”。。。无论行是否匹配，该代码都会打印出整个文件吗？我找到的唯一解决方案是循环枚举（行列表）
，然后行列表[I]=re.sub（）
。。。（re.sub
返回字符串的副本），出于性能/内存原因，是否要进行就地替换？不完全是，只是为了更整洁。不过谢谢，我会尝试用这种方式写入一个单独的文件。。谢谢，这是可行的，但当我将正则表达式更改为myregex=re.compile（“（？：xlink:href\=\”）（*？q=80）”）时，它不会…并且使用（？：xlink:href\=\”（[^\“]*.jpg）
：我添加了.jpg
，因为它也匹配
lines = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Page 1 -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0" y="0" width="1198" height="1576" viewBox="0 0 1198 1576" version="1.1" style="display: block;margin-left: auto;margin-right: auto;">
<path d="M0,0 L0,1576 L1198,1576 L1198,0 Z " fill="#FFFFFF" stroke="none"/>
<image preserveAspectRatio="none" x="0" y="0" width="1199" height="1577" xlink:href="https://cdn-assets.somewhere.com/f929e7b4404d3e48918cdc0ecd4efbc9fa91dab5_2734/9c237c7e35efe88878f9e5f7a3555f7a379ed9ee9d95b491b6d0003fd80afc6b/9c68a28d6b5161f7e975a163ff093a81172916e23c778068c6bfefcfab195154.jpg?q=80"/>
<g fill="#112449">
<use xlink:href="#f0_2" transform="matrix(19.1,0,0,19.1,885.3,204.3)"/>
<use xlink:href="#f0_o" transform="matrix(19.1,0,0,19.1,910,204.3)"/>
<use xlink:href="#f0_t" transform="matrix(19.1,0,0,19.1,930.3,204.3)"/>
<use xlink:href="#f0_f" transform="matrix(19.1,0,0,19.1,949.6,204.3)"/>"""

import re

newlines = []
for line in lines.split('\n'):
    myregex = re.compile(r"(?:xlink:href\=\")(.*)(?:\?q=80\"\/\>)")
    result = myregex.search(line)

    if result:
        print(result)
        #If a match is found, replace the text in the line          
        origfullimgfilename = result.group(1)
        replfullimgfilename='page-##-img1.jpg'
        line = re.sub(origfullimgfilename, replfullimgfilename, line)  

    newlines.append(line)

print( '\n'.join(newlines) )