Awk-如何改进正则表达式?
我有一个文件:Awk-如何改进正则表达式?,awk,Awk,我有一个文件: @Book{gjn2011ske, author = {Grzegorz J. Nalepa}, title = {Semantic Knowledge Engineering. A Rule-Based Approach}, publisher = {Wydawnictwa AGH}, year = 2011, address = {Krak\'ow} } @article{gjn2010jucs, Author = {
@Book{gjn2011ske,
author = {Grzegorz J. Nalepa},
title = {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher = {Wydawnictwa AGH},
year = 2011,
address = {Krak\'ow}
}
@article{gjn2010jucs,
Author = {Grzegorz J. Nalepa},
Journal = {Journal of Universal Computer Science},
Number = 7,
Pages = {1006-1023},
Title = {Collective Knowledge Engineering with Semantic Wikis},
Volume = 16,
Year = 2010
}
我想改进只删除第一行的正则表达式注意:无法更改记录分隔符RS=“}\n”
我试过:
awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file
我想打印结果:
author = {Grzegorz J. Nalepa},
title = {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher = {Wydawnictwa AGH},
year = 2011,
address = {Krak\'ow}
Author = {Grzegorz J. Nalepa},
Journal = {Journal of Universal Computer Science},
Number = 7,
Pages = {1006-1023},
Title = {Collective Knowledge Engineering with Semantic Wikis},
Volume = 16,
Year = 2010
谢谢你的帮助
编辑:
我提议的解决办法:
awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file
使用指定的
RS
设置很难实现您想要的结果(因为地址={Krak\ow}
有一个额外的记录结尾)。我宁愿选择:
awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE
看到了
编辑我不知道为什么它必须与一个regexp解决方案,你能解释一下吗
无论如何,还有一个()解决方案使用regexp,但不是您所期望的:
awk 'BEGIN{ RS="}\n" }
{
split($0,a,"\n")
for (e=1;e<=length(a);e++) {
if (a[e] ~ "{" && a[e] !~ "}") {
sub("$","}",a[e])
}
if (a[e] ~ "=") { print a[e] }
}
printf("\n")
}' INPUTFILE
一种不使用正则表达式的方法。将字段分隔符设置为换行符,现在寄存器的每个键都将是一个字段。这样,遍历每个字段并打印那些不以
@
开头的字段:
awk '
BEGIN {
RS="}\n";
FS=OFS="\n";
}
{
for (i=1; i<=NF; i++) {
if ( substr($i, 1, 1) != "@" ) {
printf "%s%s", $i, (i == NF) ? RS : OFS;
}
}
}
' file
我将使用
GNU sed
来执行此操作:
sed '/^@/,/^}$/ { //d }' file.txt
结果:
author = {Grzegorz J. Nalepa},
title = {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher = {Wydawnictwa AGH},
year = 2011,
address = {Krak\'ow}
Author = {Grzegorz J. Nalepa},
Journal = {Journal of Universal Computer Science},
Number = 7,
Pages = {1006-1023},
Title = {Collective Knowledge Engineering with Semantic Wikis},
Volume = 16,
Year = 2010
请注意,您可以使用-i
标志进行适当的更改(即覆盖文件内容),也可以使用-s
标志对多个文件进行更改。例如:
sed -s -i '/^@/,/^}$/ { //d }' *.txt
测试如下:
> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
author = {Grzegorz J. Nalepa},
title = {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher = {Wydawnictwa AGH},
year = 2011,
address = {Krak\'ow}
Author = {Grzegorz J. Nalepa},
Journal = {Journal of Universal Computer Science},
Number = 7,
Pages = {1006-1023},
Title = {Collective Knowledge Engineering with Semantic Wikis},
Volume = 16,
Year = 2010
>
谢谢你的解决方案。您的示例将“}”保留在最后一行的末尾。请参阅我的编辑和建议的解决方案。感谢您提供的解决方案,但等待仍然存在,例如,正则表达式。请参阅“我的编辑”和“建议的解决方案”。@Tedee12345:无法更改
awk
的记录分隔符会产生比解决问题更多的问题。围绕这些问题编写代码从来都不是一个好主意。你应该考虑发布为什么你认为保持代码> RS=“}\n”是个好主意。如果是,请提供更多样本数据。祝你好运。谢谢你提供的解决方案,但是等待仍然存在,例如,正则表达式。请参阅我的编辑和建议的解决方案。再次感谢您的回复。你提出的第一个解决方案很适合我。这个答案与20小时前佐尔特的答案几乎相同。你应该考虑像我一样投票,谢谢你的解决方案的例子。
sed -s -i '/^@/,/^}$/ { //d }' *.txt
awk '{if($0!~/@/&&$0!~/^}/)print}' temp
> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
author = {Grzegorz J. Nalepa},
title = {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher = {Wydawnictwa AGH},
year = 2011,
address = {Krak\'ow}
Author = {Grzegorz J. Nalepa},
Journal = {Journal of Universal Computer Science},
Number = 7,
Pages = {1006-1023},
Title = {Collective Knowledge Engineering with Semantic Wikis},
Volume = 16,
Year = 2010
>