Regex 如何替换除兴趣字符串以外的所有内容
file.txtRegex 如何替换除兴趣字符串以外的所有内容,regex,bash,awk,sed,Regex,Bash,Awk,Sed,file.txt fruits:banana,apple,grape,limon,orange,tomate, fruits:apple,limon, fruits:banana,grape,limon, fruits:orange,tomate,grape, fruits:banana, fruits:apple, fruits:banana,apple, 我需要替换所有与水果“香蕉”不同的东西,并获得如下输出: fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,F
fruits:banana,apple,grape,limon,orange,tomate,
fruits:apple,limon,
fruits:banana,grape,limon,
fruits:orange,tomate,grape,
fruits:banana,
fruits:apple,
fruits:banana,apple,
我需要替换所有与水果“香蕉”不同的东西,并获得如下输出:
fruits:banana,FRUIT,FRUIT,FRUIT,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,
fruits:banana,FRUIT,FRUIT,
fruits:FRUIT,FRUIT,FRUIT,
fruits:banana,
fruits:FRUIT,
fruits:FRUIT,apple,
我尝试使用awk,但只能替换特定字符串的字段
示例将所有字符串“apple”替换为水果2,或将所有字符串“apple”替换为水果2,将所有字符串“tomate”或“orange”替换为水果3
awk -F":" '{ gsub(/apple/,"FRUIT2",$2); print }' OFS="," file.tx
或
但我真正需要的是替换所有不同于任何字符串的东西,例如:fruit4
如何生成这样的输出
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT3,FRUIT3,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT4,FRUIT4,FRUIT4,
fruits:FRUIT3,FRUIT3,FRUIT4,
fruits:FRUIT4,
fruits:FRUIT2,
fruits:FRUIT4,FRUIT2
此awk应在以下情况下工作:
awk -F, -v OFS=, '{
for (i=1; i<=NF; i++)
if ($i !~ /(^|:)banana$/)
sub(/[^:]+$/, "FRUIT", $i)
} 1' file
要使流程自动化,您可以
awk -F '[:,]' -v OFS=, '
{
for (i=2; i<=NF; i++)
if ($i)
if (seen[$i])
$i = seen[$i]
else
$i = seen[$i] = "FRUIT" ++n
sub(OFS, ":")
print
}
END {
print "map:"
for (key in seen)
print key "\t" seen[key]
}
' file
如果希望能够在命令行上灵活地指定新旧名称的映射,请执行以下操作:
$ cat tst.awk
BEGIN {
FS="[:,]"; OFS=","
split(map,t)
for (i=1; i in t; i+=2) {
m[t[i]] = t[i+1]
}
}
{
printf "%s:", $1
for (i=2;i<=NF;i++) {
if ($i in m ) { $i = m[$i] }
else if ("*" in m) { $i = m["*"] }
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
您想让输出看起来像吗?不清楚您是想让特定的水果名称映射到特定的新值(apple=Frooth39),还是想让每个新水果名称映射到基于某个带有数字后缀(Frooth1、Frooth2等)的基的新值,或者其他什么。这种方法在其他水果上会失败,例如,如果目标是苹果而不是香蕉,并且输入文件包含菠萝,则菠萝不会转换为水果。我认为将
:
之前的部分包含在$1
中,因此在比较中确实不是正确的方法,因此会导致不必要的复杂性和潜在的错误情况。感谢Ed,是的,需要一个词边界来保护搜索。欢迎您。你应该提一下,这会使它变得具体。哦,吹毛求疵-gsub应该只是一个sub。再次感谢,是的,确实不需要gsub
。我还将regex改为(^ |:)
,因此它不是gnu awk特有的。听起来不错,现在让我们希望OP永远不必处理Açai浆果
或类似的包含空格的浆果,等等:-)。
awk -F '[:,]' -v OFS=, '
{
for (i=2; i<=NF; i++)
if ($i)
if (seen[$i])
$i = seen[$i]
else
$i = seen[$i] = "FRUIT" ++n
sub(OFS, ":")
print
}
END {
print "map:"
for (key in seen)
print key "\t" seen[key]
}
' file
fruits:FRUIT1,FRUIT2,FRUIT3,FRUIT4,FRUIT5,FRUIT6,
fruits:FRUIT2,FRUIT4,
fruits:FRUIT1,FRUIT3,FRUIT4,
fruits:FRUIT5,FRUIT6,FRUIT3,
fruits:FRUIT1,
fruits:FRUIT2,
fruits:FRUIT1,FRUIT2,
map:
orange FRUIT5
tomate FRUIT6
apple FRUIT2
limon FRUIT4
banana FRUIT1
grape FRUIT3
$ cat tst.awk
BEGIN {
FS="[:,]"; OFS=","
split(map,t)
for (i=1; i in t; i+=2) {
m[t[i]] = t[i+1]
}
}
{
printf "%s:", $1
for (i=2;i<=NF;i++) {
if ($i in m ) { $i = m[$i] }
else if ("*" in m) { $i = m["*"] }
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
$ awk -v map='apple,FRUIT2,tomate,FRUIT3,*,FRUIT4' -f tst.awk file
fruits:FRUIT4,FRUIT2,FRUIT4,FRUIT4,FRUIT4,FRUIT3,FRUIT4
fruits:FRUIT2,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT3,FRUIT4,FRUIT4
fruits:FRUIT4,FRUIT4
fruits:FRUIT2,FRUIT4
fruits:FRUIT4,FRUIT2,FRUIT4
$ awk -v map='apple,BAZINGA,*,VEGGIE' -f tst.awk file
fruits:VEGGIE,BAZINGA,VEGGIE,VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE,VEGGIE,VEGGIE
fruits:VEGGIE,VEGGIE
fruits:BAZINGA,VEGGIE
fruits:VEGGIE,BAZINGA,VEGGIE
$ awk -v map='apple,FRUIT2,tomate,FRUIT3' -f tst.awk file
fruits:banana,FRUIT2,grape,limon,orange,FRUIT3,
fruits:FRUIT2,limon,
fruits:banana,grape,limon,
fruits:orange,FRUIT3,grape,
fruits:banana,
fruits:FRUIT2,
fruits:banana,FRUIT2,