File sed如何读入和处理未知长度的文件_File_Sed

File sed如何读入和处理未知长度的文件

file sed

File sed如何读入和处理未知长度的文件,file,sed,File,Sed,我想在html源文件中插入一个长度未知的已标记（转换为html）文本文件，但至少要插入两行。我本来打算使用m4，但是“include”会读取整个文件。所以，关于sed 一旦找到指示插入点开始的模式，第一行将附加到标记，第二行类似（但不同的类），然后循环直到EOF，然后输出源文件的其余部分查找插入点是可以的，打印源文件的其余部分也是可以的。我有一个问题，sed循环读取文本文件，直到它完成示例输入 title1 author1 title2 author2 ... titleN authorN

我想在html源文件中插入一个长度未知的已标记（转换为html）文本文件，但至少要插入两行。我本来打算使用m4，但是“include”会读取整个文件。所以，关于sed

一旦找到指示插入点开始的模式，第一行将附加到

标记，第二行类似（但不同的类），然后循环直到EOF，然后输出源文件的其余部分

查找插入点是可以的，打印源文件的其余部分也是可以的。我有一个问题，sed循环读取文本文件，直到它完成

示例输入

title1
author1
title2
author2
...
titleN
authorN

期望输出

<!-- above here is source file, below is sed'ed output -->
<div class="title">
title1
</div>
<div class="author">
author1
</div>
<div class="title">
title2
</div>
<div class="author">
author2
</div>
...
<div class="title">
titleN
</div>
<div class="author">
authorN
</div>
<!-- below is rest of source file -->


标题1
作者1
标题2
作者2
...
标题
作者

我不太关心换行，一行就行了，这个例子只是为了弄清楚发生了什么`

我可以让它与

a\配合使用，不要把sed仅仅看作是一种循环行的语言。通过将第一行和最后一行匹配为行范围，可以指定行范围：
sed '/firstRE/,/secondRE/s/ThingsBetweenLines/ReplaceWithThis/'

例如：
[ghoti@pc ~]$ printf 'one\ntwo\nthree\nfour\nfive\n' | sed '/two/,/four/s/[ore]/_/g'
one
tw_
th___
f_u_
five
[ghoti@pc ~]$ 

问题是sed并不擅长插入整行，而且sed也没有一种方式来表示“当前行号为偶数/奇数”。多行的东西既神秘又丑陋。如果我记得的话，Gnu sed确实有一些多行符号，但现在已经很晚了，我永远都不记得如何使用非标准的东西
所以我推荐awk.：）它的代码更容易阅读，并且更适合这种任务
awk '
  BEGIN {
    fmt="<div class=\"title\">%s</div>\n<div class=\"author\">%s</div>\n";
  }
  {
    title=$0; getline; author=$0;
    printf(fmt, title, author);
  }
'

awk'
开始{
fmt=“%s\n%s\n”；
}
{
title=$0；getline；author=$0；
printf（fmt、标题、作者）；
}
'

当然，您也可以在纯shell中执行此操作：
#!/bin/sh

fmt="<div class=\"title\">%s</div>\n<div class=\"author\">%s</div>\n"

while read line; do
  if [ -z "$title" ]; then
    title="$line"
    continue
  fi
  author="$line"
  printf "$fmt" "$title" "$author"
  title=''
done

#/垃圾箱/垃圾箱
fmt=“%s\n%s\n”
读行时；做
如果[-z“$title”]；然后
title=“$line”
持续
fi
author=“$line”
printf“$fmt”$title“$author”
标题=“”
完成

看，这对我很有用：
[ghoti@pc ~/tmp]$ printf 'title1\nauthor1\ntitle2\nauthor2\n' | ./doit
<div class="title">title1</div>
<div class="author">author1</div>
<div class="title">title2</div>
<div class="author">author2</div>
[ghoti@pc ~/tmp]$ printf 'title1\nauthor1\ntitle2\nauthor2\n' | ./doit.awk
<div class="title">title1</div>
<div class="author">author1</div>
<div class="title">title2</div>
<div class="author">author2</div>
[ghoti@pc ~/tmp]$ 

[ghoti@pc~/tmp]$printf'title1\nauthor1\ntitle2\nauthor2\n'./doit
标题1
作者1
标题2
作者2
[ghoti@pc~/tmp]$printf'title1\nauthor1\ntitle2\nauthor2\n'./doit.awk
标题1
作者1
标题2
作者2
[ghoti@pc~/tmp]$
这可能适合您（GNU-sed）：
您有两个输入文件。其中包括：
some text
insertion point pattern
rest of the text

再加上第二个文件中交替出现的标题和作者行列表
输出应为：
some text
insertion point pattern
...alternating list of title and author <div>s
rest of the text

如果向脚本发送HUP、INT、QUIT、PIPE或TERM信号，则使用trap
命令的详细信息可确保脚本在自身之后进行清理
第一个sed
脚本使用N
组合相邻的行，因此它在模式空间的两行上给出标题和作者。然后，另一行将换行符两侧的材料收集到\1
和\2
中，然后对它们进行标记
第二个sed
脚本识别插入点，打印该行，读取标题和作者的预处理文件（注意双引号，以允许shell在读取下一行之前展开$tmp
）
需要临时文件是一种轻微的麻烦，但这样做将“格式化标题和作者信息”和“将格式化的标题和作者信息复制到数据流中的正确位置”这两种不同的职责完全分开
如果在输出中需要标记HTML/XML注释，可以使用以下方法使预处理脚本复杂化：
   -e '1i\
      <!-- above here is source file, below is sed'ed output -->' \
   -e '$a\
      <!-- below is rest of source file -->'

这样做的缺点是额外的文件——sed脚本。当然，您可以动态生成另一个临时文件。我的诀窍是：
tmp=${TMPDIR:-/tmp}/at.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15

cat > $tmp.1 <<'EOF'
1i\
<!-- above here is source file, below is sed'ed output -->
$a\
<!-- below is rest of source file -->
N
s%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%
EOF

sed -f $tmp.1 title.authors > $tmp.2

sed "/insertion point pattern/r $tmp.2" main-file > output-file

rm -f $tmp.?
trap 0

tmp=${TMPDIR:-/tmp}/at$$
陷阱“rm-f$tmp.？；出口1”01 2 3 13 15
类别>$tmp.1
一美元\
N
s%\（.*）\n\（.*）%输出文件
rm-f$tmp。？
陷阱0

更改是使用生成的临时名称作为前缀，实际的临时文件是$tmp.1
，$tmp.2
。清理只是略有不同，以反映可能有多个临时文件要删除
显然，您可以将两个输入文件安排为脚本的参数，只需将脚本写入标准输出，这样您就可以将其输出重定向到任何需要的地方，而不是强制将其重定向到输出文件
。事实上，通用脚本应该做到这一点。
这不是sed的工作，而是awk的工作：
awk 'NR==FNR{a[NR]=$0; next} {print} /<div class=/{print a[++c]}' file1.txt file2.html

awk'NR==FNR{a[NR]=$0；next}{print}/这里涉及HTML和regex，请注意：。如何识别哪一行是标题，哪一行是作者？还是都是奇数行标题，都是偶数行作者？@AndrewMarshall谢谢。这个链接读起来很有趣。实际上，我只是测试一个非空行，而不是任何标记，但重点是。@ghoti文件定义为（title\nauthor\n）{1，}，因此，在每个对联中，第一行是标题，第二行是作者。您的对联.sed
脚本命名良好且有效-做得好。脚本的其余部分是难以理解的，我看不出您希望它做什么。我想投票，但我还不能。@JonathanLeffler我已经为解决方案添加了一个解释。哦，我想我明白了……GNUsed
中的e
命令意味着“将以下内容作为shell脚本执行，其标准输出进入主sed
脚本的输出，它的标准输入来自/dev/null
，或者其他类似的地方。哎呀，同样恶心。但如果GNUsed
支持这一点，我想这会给它一些合法性。它不适用于任何普通的sed
，我也不确定我是否喜欢它……但也许我太过时了。@JonathanLefflere命令大约在2002年左右在3.95版的GNU sed中引入。它仍然不在POSIX中，我使用POSIX不仅仅是将我的经验调整到GNU中，准确地说，这样我就不会在不支持u的系统上遇到问题
   -e '1i\
      <!-- above here is source file, below is sed'ed output -->' \
   -e '$a\
      <!-- below is rest of source file -->'

1i\
<!-- above here is source file, below is sed'ed output -->
$a\
<!-- below is rest of source file -->
N
s%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%

tmp=${TMPDIR:-/tmp}/at.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15

cat > $tmp.1 <<'EOF'
1i\
<!-- above here is source file, below is sed'ed output -->
$a\
<!-- below is rest of source file -->
N
s%\(.*\)\n\(.*\)%<div class="title">\1</div>\n<div class="author">\2</div>%
EOF

sed -f $tmp.1 title.authors > $tmp.2

sed "/insertion point pattern/r $tmp.2" main-file > output-file

rm -f $tmp.?
trap 0

awk 'NR==FNR{a[NR]=$0; next} {print} /<div class=/{print a[++c]}' file1.txt file2.html