Regex 带引号和逗号的正则表达式模式_Regex_Bash_Unix_Awk_Sed

Regex 带引号和逗号的正则表达式模式

regex bash unix awk sed

Regex 带引号和逗号的正则表达式模式,regex,bash,unix,awk,sed,Regex,Bash,Unix,Awk,Sed,我正在试图找到正确的正则表达式，以便在文件中搜索由逗号分隔的双引号。例如，我试图找到“2742734”，然后在文本编辑器中替换它，将逗号改为每4个数字一个，这样最终结果将是“2742734” 我尝试了一些我在SO上找到的例子，但是没有一个能像这样帮助我解决这个问题 "[^"]+" '\d+' 虽然上面确实找到了匹配项，但我不知道如何处理逗号以及如何用什么替换它谢谢你的帮助可能有更好的方法，但我建议以下方法：输入： $ cat to_transform.txt abc "27,422,7

我正在试图找到正确的正则表达式，以便在文件中搜索由逗号分隔的双引号。例如，我试图找到

“2742734”

，然后在文本编辑器中替换它，将逗号改为每4个数字一个，这样最终结果将是

“2742734”

我尝试了一些我在SO上找到的例子，但是没有一个能像这样帮助我解决这个问题

"[^"]+"

'\d+'

虽然上面确实找到了匹配项，但我不知道如何处理逗号以及如何用什么替换它

谢谢你的帮助

可能有更好的方法，但我建议以下方法：

输入：

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"

$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'

$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"

s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.

CMD:

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"

$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'

$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"

s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.

代码详细信息和说明：

$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"

$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'

$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"

s/,//g #remove all , in the number
:loop  #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.

先决条件：您的输入符合“[0-9，]*”并且是“#，###”格式的正确数字
#!/bin/bash
colonmv () {
     echo $1 | sed -r 's/,([0-9]{3})+/\1/g;' | \
     rev | sed -r 's/[^0-9]?([0-9]{4})/\1,/g;s/,"$/"/;s/.*/"&/' | rev
}

colonmv '"734"'
colonmv '"2,734"'
colonmv '"22,734"'
colonmv '"422,734"'
colonmv '"7,422,734"'
colonmv '"27,422,734"'
colonmv '"127,422,734"'
colonmv '"5,127,422,734"'

测试：
我发现了一个更短的解决方案（与gnu sed一起使用）：
但是请注意，第一个sed命令会吃掉每个逗号，而不仅仅是数字之间的逗号，所以请在输入之前对其进行改进或过滤
第二个命令使用：a技巧
读取4位数字，然后用相同的加逗号的非数字（>）替换，当替换发生时，从ta跳回到：a并重复
现在，让我们看看野外的科隆病毒：
colonmv '"A 3-grouped, pretty long number: 5,127,422,734 and an ungrouped one 5678905567789065778"'
"A 3-grouped pretty long number: 51,2742,2734 and an ungrouped one 567,8905,5677,8906,5778"

你在bash/shell吗？你用什么工具来替换文本？哪种语言？@Allan:这很讽刺吗？：）我必须承认，我根本不熟悉贪婪的非贪婪前缀。可能是因为它们很新吗？迟早，我必须对他们有更多的了解。叹息.）不，没什么讽刺的，看看我的答案，太长了！就性能而言，我认为你的也不错！（我还必须在sed
代码中循环，我多次调用sed
，并让awk
处理一个临时文件，因此…@Allan:在734中找到了一个更短的文件。是“
”“
输入错误？你看到了！不，试过了，这是命令的产物。我的新解决方案没有出现此错误。：）谢谢你的帮助！我终于明白了：）