Python正则表达式嵌套搜索和替换_Python_Regex_Replace

Python正则表达式嵌套搜索和替换

python regex replace

Python正则表达式嵌套搜索和替换,python,regex,replace,Python,Regex,Replace,我需要进行正则表达式搜索，并替换在报价块中找到的所有逗号。 i、 e 需要成为 "thing1\,blah","thing2\,blah","thing3\,blah",thing4 我的代码： inFile = open(inFileName,'r') inFileRl = inFile.readlines() inFile.close() p = re.compile(r'["]([^"]*)["]') for line in inFileRl: pg = p.search

我需要进行正则表达式搜索，并替换在报价块中找到的所有逗号。
i、 e

需要成为

"thing1\,blah","thing2\,blah","thing3\,blah",thing4

我的代码：

inFile  = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()

p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
    pg = p.search(line)
    # found comment block
    if pg:
        q  = re.compile(r'[^\\],')
        # found comma within comment block
        qg = q.search(pg.group(0))
        if qg:
            # Here I want to reconstitute the line and print it with the replaced text
            #print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))

我只需要根据正则表达式过滤我想要的列，进一步过滤，
然后执行RegEx替换，然后重新构建行

如何在Python中执行此操作？

常规编辑

有

"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4

在这个问题上，现在它已经不存在了

此外，我没有评论

r'[^\\]，'

所以，我完全重写了我的答案

"thing1,blah","thing2,blah","thing3,blah",thing4

及

是字符串的显示（我想）

你可以试试这个正则表达式


>>> re.sub('(?<!"),(?!")', r"\\,", 
                     '"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4


>>>re.sub（“（？csvcsv
模块非常适合解析像这样的数据，因为csv.reader
在默认方言中忽略带引号的逗号。csv.writer
由于存在逗号而重新插入引号。我使用StringIO
为字符串提供了一个类似文件的接口
import csv
import StringIO

s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
    wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()

结果:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"

我提出了一个使用几个正则表达式函数的迭代解决方案：

finditer（）、findall（）、group（）、start（）和end（）

有一种方法可以将所有这些转换为一个调用自身的递归函数。

有人要吗
outfile  = open(outfileName,'w')

p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
    pg = p.finditer(line)
    pglen = len(p.findall(line))

    if pglen > 0:
        mpgstart = 0;
        mpgend   = 0;

        for i,mpg in enumerate(pg):
            if i == 0:
                outfile.write(line[:mpg.start()])

            qg    = q.finditer(mpg.group(0))
            qglen = len(q.findall(mpg.group(0)))

            if i > 0 and i < pglen:
                outfile.write(line[mpgend:mpg.start()])

            if qglen > 0:
                for j,mqg in enumerate(qg):
                    if j == 0:
                        outfile.write( mpg.group(0)[:mqg.start()]    )

                    outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )

                    if j == (qglen-1):
                        outfile.write( mpg.group(0)[mqg.end():]      )
            else:
                outfile.write(mpg.group(0))

            if i == (pglen-1):
                outfile.write(line[mpg.end():])

            mpgstart = mpg.start()
            mpgend   = mpg.end()
    else:
        outfile.write(line)

outfile.close()

outfile=open（outfileName，'w'）
p=重新编译（r'[“]”（[^“]*）[“]”）
q=重新编译（r'（[^\\]）（，））
对于出料口的管线：
pg=p.FindItemer（行）
pglen=len（p.findall（行））
如果pglen>0：
mpgstart=0；
mpgend=0；
对于i，枚举中的mpg（pg）：
如果i==0：
outfile.write（第[：mpg.start（）]行）
qg=q.finditer（mpg.group（0））
qglen=len（q.findall（mpg.group（0）））
如果i>0且i0：
对于j，枚举中的mqg（qg）：
如果j==0：
outfile.write（mpg.group（0）[:mqg.start（）]
outfile.write（re.sub（r'（[^\\]）（，）'，r'\1\\\2'，mqg.group（0）））
如果j==（qglen-1）：
outfile.write（mpg.group（0）[mqg.end（）：]）
其他：
输出文件写入（mpg组（0））
如果i==（pglen-1）：
write（第[mpg.end（）：]行）
mpgstart=mpg.start（）
mpgend=mpg.end（）
其他：
输出文件。写入（行）
outfile.close（）
您查看过str.replace（）吗
str.replace（旧的，新的[，计数]）
返回包含所有子字符串old的字符串副本
替换为new。如果给定可选参数计数，则只有
第一次计数的出现被替换
这是一些文档
希望这对您有所帮助
这不是一个真正的答案，但在您重新实现之前，也许CSV解析器可以更好地为您服务？这似乎是您正在处理的格式。实际上，我正在为我的自定义CSV解析器CSV.register\u准备数据（'EscapeExcel'，delimiter='，'，SkipInInitialSpace=0，doublequote=1，QUOTE=csv.QUOTE\u ALL，quotechar='''，lineterminator='\r\n'，escapechar='\\'）我明白了，那么我相信你想使用match对象的span
和start
方法来获取周围的内容并重新组合你的行。但是我不确定为什么在“选择”之后只调用一次sub循环将不正常。@Dragos Toader:为什么要替换引号内的逗号？csv。reader
对引号内的逗号没有问题。添加反斜杠只是意味着解析器要处理的另一种机制。现在还需要反斜杠。正确的修复方法是教您的csv解析器忽略逗号在双引号内，或者使用现有的CSV解析器。您的解决方案比我的好。您只需要编写模式'（[^“]+）*，*（[^“]+）'
，甚至'（[^“]+）[\t]*，[\t]*（[^“]+）“
如果空格和您提到的支票之间有逗号。谢谢！引号之间有两个逗号的字符串如何处理？“thing1，blah，moreblah”
@StevenRumbalski是的，在这种情况下它不起作用。在这种情况下必须同时使用前向和后向。我将看看是否可以进行这些更改。关于re.sub（“+？”，lambda m:m.group（0）。替换（“，”，“\\，”），““th，ing1，blah”，“thing2”，“blah”“，“thing3，blah”，“thing4”）如何
？这个答案已经被提升了。我想知道为什么。我也很困惑，我的代表被减少了1分而不是2分！+1，但你需要写项。替换（'\'，'，'，'，'，'，'，'）。替换（''，'，'，'\'，'），否则“thing3\，blah”被替换为“thing3\，blah”您的代码非常复杂：您使用正则表达式只完成正则表达式的一小部分功能，以便获得字符串元素，您可以对这些元素执行处理，这要归功于使用正则表达式所执行的字符串方法。顺便说一句，您没有注意到您的代码为“thing3，废话”给出了错误的结果转换成“thingx\x03，废话”，我不知道怎么做。另外，顺便问一下，你有没有在任何时候说过，对于你为了得到答案而提出的问题，存在着一些答案和争论？你没有丝毫暗示我们希望有帮助的其他答案。相反，你毫无兴趣地展示了一个代码，就好像你没有阅读答案一样。我发现这有点不公平。

>>> re.sub('(?<!"),(?!")', r"\\,", 
                     '"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4

import csv
import StringIO

s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
    wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()

"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"

outfile  = open(outfileName,'w')

p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
    pg = p.finditer(line)
    pglen = len(p.findall(line))

    if pglen > 0:
        mpgstart = 0;
        mpgend   = 0;

        for i,mpg in enumerate(pg):
            if i == 0:
                outfile.write(line[:mpg.start()])

            qg    = q.finditer(mpg.group(0))
            qglen = len(q.findall(mpg.group(0)))

            if i > 0 and i < pglen:
                outfile.write(line[mpgend:mpg.start()])

            if qglen > 0:
                for j,mqg in enumerate(qg):
                    if j == 0:
                        outfile.write( mpg.group(0)[:mqg.start()]    )

                    outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )

                    if j == (qglen-1):
                        outfile.write( mpg.group(0)[mqg.end():]      )
            else:
                outfile.write(mpg.group(0))

            if i == (pglen-1):
                outfile.write(line[mpg.end():])

            mpgstart = mpg.start()
            mpgend   = mpg.end()
    else:
        outfile.write(line)

outfile.close()