Regex 带引号和逗号的正则表达式模式
我正在试图找到正确的正则表达式,以便在文件中搜索由逗号分隔的双引号。例如,我试图找到Regex 带引号和逗号的正则表达式模式,regex,bash,unix,awk,sed,Regex,Bash,Unix,Awk,Sed,我正在试图找到正确的正则表达式,以便在文件中搜索由逗号分隔的双引号。例如,我试图找到“2742734”,然后在文本编辑器中替换它,将逗号改为每4个数字一个,这样最终结果将是“2742734” 我尝试了一些我在SO上找到的例子,但是没有一个能像这样帮助我解决这个问题 "[^"]+" '\d+' 虽然上面确实找到了匹配项,但我不知道如何处理逗号以及如何用什么替换它 谢谢你的帮助 可能有更好的方法,但我建议以下方法: 输入: $ cat to_transform.txt abc "27,422,7
“2742734”
,然后在文本编辑器中替换它,将逗号改为每4个数字一个,这样最终结果将是“2742734”
我尝试了一些我在SO上找到的例子,但是没有一个能像这样帮助我解决这个问题
"[^"]+"
'\d+'
虽然上面确实找到了匹配项,但我不知道如何处理逗号以及如何用什么替换它
谢谢你的帮助 可能有更好的方法,但我建议以下方法: 输入:
$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
CMD:
$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
代码详细信息和说明:
$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
s/,//g #remove all , in the number
:loop #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
先决条件:您的输入符合“[0-9,]*”并且是“#,###”格式的正确数字
测试:#!/bin/bash colonmv () { echo $1 | sed -r 's/,([0-9]{3})+/\1/g;' | \ rev | sed -r 's/[^0-9]?([0-9]{4})/\1,/g;s/,"$/"/;s/.*/"&/' | rev } colonmv '"734"' colonmv '"2,734"' colonmv '"22,734"' colonmv '"422,734"' colonmv '"7,422,734"' colonmv '"27,422,734"' colonmv '"127,422,734"' colonmv '"5,127,422,734"'
我发现了一个更短的解决方案(与gnu sed一起使用): 但是请注意,第一个sed命令会吃掉每个逗号,而不仅仅是数字之间的逗号,所以请在输入之前对其进行改进或过滤 第二个命令使用:a技巧 读取4位数字,然后用相同的加逗号的非数字(>)替换,当替换发生时,从ta跳回到:a并重复 现在,让我们看看野外的科隆病毒:colonmv '"A 3-grouped, pretty long number: 5,127,422,734 and an ungrouped one 5678905567789065778"' "A 3-grouped pretty long number: 51,2742,2734 and an ungrouped one 567,8905,5677,8906,5778"
你在bash/shell吗?你用什么工具来替换文本?哪种语言?@Allan:这很讽刺吗?:)我必须承认,我根本不熟悉贪婪的非贪婪前缀。可能是因为它们很新吗?迟早,我必须对他们有更多的了解。叹息.)不,没什么讽刺的,看看我的答案,太长了!就性能而言,我认为你的也不错!(我还必须在
代码中循环,我多次调用sed
,并让sed
处理一个临时文件,因此…@Allan:在awk
输入错误?你看到了!不,试过了,这是命令的产物。我的新解决方案没有出现此错误。:)谢谢你的帮助!我终于明白了:)734中找到了一个更短的文件。是
”““