Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python的regx_Python_Regex_Python 3.7 - Fatal编程技术网

python的regx

python的regx,python,regex,python-3.7,Python,Regex,Python 3.7,我正在编写regx,以便从字符串中删除测试限定符和额外的分隔符 我有一些模式如下 "ID"~"Name"~"DESC" 1~2014~13~"DS"~DF" 1~2014~13~"DS"~"DF" "1ABCA~B C"~"ERTE" "2"~"XYZ"~"ABC~ is~ bother" "3"~"YYZ"~"MEL O CRÈME DOUGHNUTS RECLASS" 4~"XAA"~"sf~sd sdfsf" 5~"TES"~"SFSFSF"sdfsf" 6~"ABC"SDDSL~"df

我正在编写regx,以便从字符串中删除测试限定符和额外的分隔符

我有一些模式如下

"ID"~"Name"~"DESC"
1~2014~13~"DS"~DF"
1~2014~13~"DS"~"DF" 
"1ABCA~B C"~"ERTE"
"2"~"XYZ"~"ABC~ is~ bother"
"3"~"YYZ"~"MEL O CRÈME DOUGHNUTS RECLASS"
4~"XAA"~"sf~sd sdfsf"
5~"TES"~"SFSFSF"sdfsf"
6~"ABC"SDDSL~"dfadf"
预计产量为,

ID~Name~DESC
1~2014~13~DS~DF
1~2014~13~DS~DF
1ABCA B C~ERTE
2~XYZ~ABC  is  bother
3~YYZ~MEL O CRÈME DOUGHNUTS RECLASS
4~XAA~sf sd sdfsf
5~TES~SFSFSF"sdfsf
6~ABCSDDSL~dfadf
我已经为同样的问题写了下面的代码

import re

delimiter = '~'
pattern = re.compile(r'"' + delimiter + r'"')
pattern1 = re.compile(r'"[^"]*(?:""[^"]*)*"')

with open("source file path here ", "r") as \
        test:
    for line in test:
        fields = re.split(pattern, line)
        print(fields)
        output = ""
        if re.match('^[^"]', line):
            matches = re.findall(pattern1, line)
            print(matches)
            for match in matches:
                line = re.sub(match, re.sub('^["]|["]$', "", match), line)
            print(line)
        else:
            lastfield = fields[-1]
            for field in fields:
                if field != lastfield:
                    field = re.sub('^["]|["]$', "", field)
                    output = output + re.sub('[' + delimiter + ']', " ", field) \
                        + delimiter
                else:
                    field = re.sub('^["]|["]$', "", field)
                    output = output + re.sub('[' + delimiter + ']', " ", field)
        print(output)

正在寻找优化方法来完成此操作,以及处理所有模式的代码。

我认为您可以使用删除所有
~
内部字段限定符

(?m)(?:(?<=^)|(?<=~))"(.*?)"(?=$|~)
详细信息

  • (?m)
    -
    re.m
    修饰符(如果逐行处理字符串,则删除)

  • (?:(?为什么第二行会变成
    1~2014~13~DS~DF
    ?最后一行
    "
    未配对。或者第三个或更多的字符是否应保留在最后一个字段?@WiktorStribiżew,因为将此数据加载到系统时会产生问题,因此希望删除不属于数据的引号。这是我们需要处理的数据级别问题。请在问题中编写替换规则?是否应
    1~2014~13~DS~DF
    真的被复制了吗?@WiktorStribiżew抱歉,这是我的错,我错过了一个要写的模式,即1~2014~13~“DS”~“DF”@WiktorStribiżew规则是删除不需要的分隔符和引号,如果引号作为字符串的一部分出现,我们需要保留它,因为数据是不变的。
    (?m)(?:(?<=^)|(?<=~))"|"(?=$|~)|"(?=[^\n"~]+(?:~|$))
    
    import re
    rx_0 = r"""(?m)(?:(?<=^)|(?<=~))"(.*?)"(?=$|~)"""
    rx = r"""(?m)(?:(?<=^)|(?<=~))"|"(?=$|~)|"(?=[^\n"~]+(?:~|$))"""
    s = ("\"ID\"~\"Name\"~\"DESC\"\n"
        "1~2014~13~\"DS\"~DF\"\n"
        "1~2014~13~\"DS\"~\"DF\"\n"
        "\"1ABCA~B C\"~\"ERTE\"\n"
        "\"2\"~\"XYZ\"~\"ABC~ is~ bother\"\n"
        "\"3\"~\"YYZ\"~\"MEL O CRÈME DOUGHNUTS RECLASS\"\n"
        "4~\"XAA\"~\"sf~sd sdfsf\"\n"
        "5~\"TES\"~\"SFSFSF\"sdfsf\"\n"
        "6~\"ABC\"SDDSL~\"dfadf\"")
    
    print( re.sub(rx, "", re.sub(rx_0, lambda x: x.group(1).replace('~', ' '), s)))
    
    ID~Name~DESC
    1~2014~13~DS~DF
    1~2014~13~DS~DF
    1ABCA B C~ERTE
    2~XYZ~ABC  is  bother
    3~YYZ~MEL O CRÈME DOUGHNUTS RECLASS
    4~XAA~sf sd sdfsf
    5~TES~SFSFSFsdfsf
    6~ABC"SDDSL dfadf