Regex 带引号和逗号的正则表达式模式

Regex 带引号和逗号的正则表达式模式,regex,bash,unix,awk,sed,Regex,Bash,Unix,Awk,Sed,我正在试图找到正确的正则表达式,以便在文件中搜索由逗号分隔的双引号。例如,我试图找到“2742734”,然后在文本编辑器中替换它,将逗号改为每4个数字一个,这样最终结果将是“2742734” 我尝试了一些我在SO上找到的例子,但是没有一个能像这样帮助我解决这个问题 "[^"]+" '\d+' 虽然上面确实找到了匹配项,但我不知道如何处理逗号以及如何用什么替换它 谢谢你的帮助 可能有更好的方法,但我建议以下方法: 输入: $ cat to_transform.txt abc "27,422,7

我正在试图找到正确的正则表达式,以便在文件中搜索由逗号分隔的双引号。例如,我试图找到
“2742734”
,然后在文本编辑器中替换它,将逗号改为每4个数字一个,这样最终结果将是
“2742734”

我尝试了一些我在SO上找到的例子,但是没有一个能像这样帮助我解决这个问题

"[^"]+"

'\d+'
虽然上面确实找到了匹配项,但我不知道如何处理逗号以及如何用什么替换它


谢谢你的帮助

可能有更好的方法,但我建议以下方法:

输入:

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
CMD:

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
代码详细信息和说明:

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.

  • 先决条件:您的输入符合“[0-9,]*”并且是“#,###”格式的正确数字

    #!/bin/bash
    colonmv () {
         echo $1 | sed -r 's/,([0-9]{3})+/\1/g;' | \
         rev | sed -r 's/[^0-9]?([0-9]{4})/\1,/g;s/,"$/"/;s/.*/"&/' | rev
    }
    
    colonmv '"734"'
    colonmv '"2,734"'
    colonmv '"22,734"'
    colonmv '"422,734"'
    colonmv '"7,422,734"'
    colonmv '"27,422,734"'
    colonmv '"127,422,734"'
    colonmv '"5,127,422,734"'
    
    测试:


    我发现了一个更短的解决方案(与gnu sed一起使用):

    但是请注意,第一个sed命令会吃掉每个逗号,而不仅仅是数字之间的逗号,所以请在输入之前对其进行改进或过滤

    第二个命令使用:a技巧

    读取4位数字,然后用相同的加逗号的非数字(>)替换,当替换发生时,从ta跳回到:a并重复

    现在,让我们看看野外的科隆病毒:

    colonmv '"A 3-grouped, pretty long number: 5,127,422,734 and an ungrouped one 5678905567789065778"'
    "A 3-grouped pretty long number: 51,2742,2734 and an ungrouped one 567,8905,5677,8906,5778"
    

    你在bash/shell吗?你用什么工具来替换文本?哪种语言?@Allan:这很讽刺吗?:)我必须承认,我根本不熟悉贪婪的非贪婪前缀。可能是因为它们很新吗?迟早,我必须对他们有更多的了解。叹息.)不,没什么讽刺的,看看我的答案,太长了!就性能而言,我认为你的也不错!(我还必须在
    sed
    代码中循环,我多次调用
    sed
    ,并让
    awk
    处理一个临时文件,因此…@Allan:在
    734中找到了一个更短的文件。是
    ”“
    输入错误?你看到了!不,试过了,这是命令的产物。我的新解决方案没有出现此错误。:)谢谢你的帮助!我终于明白了:)