Text Sed/Awk段落格式解决方案

Text Sed/Awk段落格式解决方案,text,awk,sed,Text,Awk,Sed,我需要从运行在一起的文本中创建段落,其中大部分的回车和/或换行符已经被删除。对话穿插在课文中。所以我想在第二次引用后插入一个空行。看来这些引语会衬托出重建的段落。我添加了前斜杠(文本中没有),因为我不知道在这个网站上引用代码的惯例。下面是一个例子: 为此: 培根伊普苏姆多洛阿梅特熏牛肉查克鹿肉猪,意大利腊肠火腿小腿猪肚。菲力牛排、火腿、培根、恶魔岛肉馅。牛肉培根比尔通肉排里脊肉排“我要培根”,恰克胸脯,兰德杰格火腿干,里脊肉。肩舌肉丸尾肉干猪里脊肉片“我要熏肉。”米格农香克利香克利香克利侧翼猪。

我需要从运行在一起的文本中创建段落,其中大部分的回车和/或换行符已经被删除。对话穿插在课文中。所以我想在第二次引用后插入一个空行。看来这些引语会衬托出重建的段落。我添加了前斜杠(文本中没有),因为我不知道在这个网站上引用代码的惯例。下面是一个例子:

为此:

培根伊普苏姆多洛阿梅特熏牛肉查克鹿肉猪,意大利腊肠火腿小腿猪肚。菲力牛排、火腿、培根、恶魔岛肉馅。牛肉培根比尔通肉排里脊肉排“我要培根”,恰克胸脯,兰德杰格火腿干,里脊肉。肩舌肉丸尾肉干猪里脊肉片“我要熏肉。”米格农香克利香克利香克利侧翼猪。短里脊猪肉里脊汉堡咸牛肉里贝耶三尖多纳火腿霍克兰德杰格t骨猪。猪五花肉法兰克福、t骨火腿、熏肉熏牛肉。Biltong牛肉chuck火腿hock猪肉里脊肩带“我想要培根”。牛排短里脊尾cupim臀部alcatra。肩胛牛肉cupim臀部磨圆。牛腰肉杯肉丸火腿里贝耶。“我要熏肉。”鹿肉尾里贝耶,熏牛肉舌猪排骨kielbasa bresaola doner。香克利菲力牛排,肩球尖猪肚脐肉香肠肥背香肠。火腿鹿肉卡皮科拉培根、短腰肉、意大利腊肠、香肠、咸牛肉。牛里脊牛里脊牛里脊牛胸肉三尖潘切塔kielbasa条纹牛排leberkas短肋骨侧菲力牛排mignon火腿飞肉。三尖杯“我要培根”,“我要培根”

为此:

培根伊普苏姆多洛阿梅特熏牛肉查克鹿肉猪,意大利腊肠火腿小腿猪肚。菲力牛排、火腿、培根、恶魔岛肉馅。牛肉培根比尔顿短里脊牛排

“我要熏肉。”

chuck brisket landjaeger火腿干leberkas猪里脊doner。肩舌肉丸尾肉干猪里脊

“我要熏肉。”

mignon柄卡盘柄侧清管器。短里脊猪肉里脊汉堡咸牛肉里贝耶三尖多纳火腿霍克兰德杰格t骨猪。猪五花肉法兰克福、t骨火腿、熏肉熏牛肉。比尔顿牛肉查克火腿飞节猪里脊肩带

“我要熏肉。”

牛扒短腰尾铜臀恶魔岛。肩胛牛肉杯臀部磨圆。牛腰肉杯肉丸火腿里贝耶

“我要熏肉。”

鹿肉尾里贝耶,熏牛肉舌猪排骨kielbasa bresaola doner。香克利菲力牛排,肩球尖猪肚脐肉香肠肥背香肠。火腿鹿肉卡皮科拉培根、短腰肉、意大利腊肠、香肠、咸牛肉。牛里脊牛里脊牛里脊牛胸肉三尖潘切塔kielbasa条纹牛排leberkas短肋骨侧菲力牛排mignon火腿飞肉。三尖杯

“我要熏肉。”

“我想要熏肉。”

试试这个:

awk 'BEGIN{RS="\ ?\"\ ?"; ORS="\n\n"}
     NR%2==0{print "\""$0"\"";next;}
     {}1' inputFile
这将在每个引用前后插入一个新段落(
“…”
)。然而,这将使最后几段看起来像这样

"I want bacon."



"I want bacon."
删除“我要培根”的空白段落:

sed可能更容易

$ sed 's/"[^"]*" /\n\n&\n\n/g' bacon
例如:

$ echo "bla bla bla \"This is bacon.\" Starts a new paragraph" | sed 's/"[^"]*" /\n\n&\n\n/g'
bla bla bla

"This is bacon."

Starts a new paragraph
输出

Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon 

"I want bacon."

 chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet 

"I want bacon."

 mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip 

"I want bacon."

steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye. 

"I want bacon."

 Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim 

"I want bacon."
"I want bacon."

对于多字符RS和GESUB(),使用GNU awk:


为什么最后两个引号不在单独的行上?已修复,输入框中的文本在单独的行上,但输出框需要额外的回车符。这不会在对话框前后添加空行,但会在单独的行上打印对话框。现在的问题是,如果没有空白行,对话就不会被当作单独的段落来对待。哦,我错过了这个要求。我已更正了我的答案-我只需将
ORS=“\n”
替换为
ORS=“\n\n”
。现在,每个引号的前面和后面都有一个换行符和一个空行。得到了:awk:source line 1上下文中的语法错误是BEGIN{RS=“\?\”\?“\?”>>>ORS=Sorry,
RS=“\?\?”\?“
之后缺少了
很好!只是一句话:“奇”段落以空格开头(第一段除外)最后两行“我想要培根”之间没有空行。这不起作用。该文件未被触及,既通过sed运行,也通过管道将其传输到另一个文件(>test.txt)它会写到StdOut.我添加了一个例子.也许你使用的是一个不兼容的<代码> SED .你能检查一个SED脚本中的<代码> SED:<代码> > \n>代码>是否产生了每个POSIX的未定义行为?这可能是问题所在吗?如果你使用一个shell(例如BASH),允许一个换行符<代码> $'\n \ /Cube >,然后考虑使用<代码>“\\\n”
对于每个想要的换行符。@Ed Morton,也许我没有一个要测试,但是
sed--posix..
也对我有用。
awk -v RS='"' '{
if (NR % 2 == 1) {
    if (/[^[:space:]]/) printf "%s%s\n\n", (NR==1? "" : "\n"), $0
} else {
    printf "\"%s\"\n", $0
}}' file
Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon 

"I want bacon."

 chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet 

"I want bacon."

 mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip 

"I want bacon."

steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye. 

"I want bacon."

 Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim 

"I want bacon."
"I want bacon."
$ awk -v RS='^$' -v ORS= '{$0=gensub(/\s*("[^"]+")\s*/,"\n\n\\1\n\n","g"); gsub(/\n+/,"\n\n")}1' file
Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon

"I want bacon."

chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet

"I want bacon."

mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip

"I want bacon."

steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye.

"I want bacon."

Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim

"I want bacon."

"I want bacon."