Regex 将带引号的字符串中的换行符替换为\n_Regex_Linux_Perl_Scripting_Awk

Regex 将带引号的字符串中的换行符替换为\n

regex linux perl scripting awk

Regex 将带引号的字符串中的换行符替换为\n,regex,linux,perl,scripting,awk,Regex,Linux,Perl,Scripting,Awk,我需要编写一个快速（到明天）筛选脚本，用转义的换行符替换双引号字符串中的换行符（LF或CRLF）\n。内容是一个（损坏的）javascript程序，因此我需要在字符串中允许转义序列，如“ab\”cd“和“ab\\”cd“ef” 我知道sed不适合这项工作，因为它每行工作一次，所以我转向perl，我对它一无所知：）我已经编写了这个正则表达式：“（（（\\）\[^“\\\\n]）*\n？*”并用进行了测试。它确实匹配带换行符的带引号的字符串，但是，perl-p-e的/“（（\\\）\\\\n]）*

我需要编写一个快速（到明天）筛选脚本，用转义的换行符替换双引号字符串中的换行符（LF或CRLF）

\n

。内容是一个（损坏的）javascript程序，因此我需要在字符串中允许转义序列，如

“ab\”cd“

和

“ab\\”cd“ef”

我知道sed不适合这项工作，因为它每行工作一次，所以我转向perl，我对它一无所知：）

我已经编写了这个正则表达式：

“（（（\\）\[^“\\\\n]）*\n？*”

并用进行了测试。它确实匹配带换行符的带引号的字符串，但是，

perl-p-e的/“（（\\\）\\\\n]）*（\n）？）*“/TEST/g'

不匹配

因此，我的问题是：

如何使perl匹配换行符

如何编写“replace by”部分，使其保留原始字符串并仅替换换行符

awk解决方案也有这样的功能，但这并不是我所需要的

注意：我通常不会问“请为我做这个”的问题，但我真的不想在明天之前学习perl/awk…：）

编辑：样本数据

"abc\"def" - matches as one string
"abc\\"def"xy" - match "abcd\\" and "xy"
"ab
cd
ef" - is replaced by "ab\ncd\nef"

在OP发布一些示例内容以供测试之前，请尝试将“m”（可能还有“s”）标志添加到正则表达式的末尾；来源：

对于测试，您可能还会发现添加命令行参数“-i.bak”，以便保留原始文件的备份（现在扩展名为“.bak”）

另外请注意，如果您想捕获但不存储某些内容，可以使用

（？：PATTERN）

而不是

（PATTERN）

。一旦捕获了内容，请使用

$1

到

$9

从匹配部分访问存储的匹配项

有关更多信息，请参见关于和的链接，这里是一个简单的Perl解决方案：

s§
    \G # match from the beginning of the string or the last match
    ([^"]*+) # till we get to a quote
    "((?:[^"\\]++|\\.)*+)" # match the whole quote
§
    $a = $1;
    $b = $2;
    $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
    "$a\"$b\"";
§gex;

如果您不想使用

/e

而只需使用一个正则表达式，那么这里有另一个解决方案：

use strict;

$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_

print "Original:\n", $_, "\n";

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
x   # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;

print "Replaced:\n", $_, "\n";

要使用换行符而不是x，只需在正则表达式中替换它，如下所示：

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;

s/
(
(?:
#在字符串开头匹配到引号内
^（？&外部报价）
#或者从最后一个始终在引号内停止的匹配继续
|（？！^）\G
)
吃东西直到找到我们想要的
)
\r？\n#我们要替换的东西
(
（？&内报价）#在报价结束前可能吃得更多
#如果超出报价范围，确保比赛在报价范围内停止
#或者在字符串的末尾
（？：“（？&外部引用）（？：”\z））？
)
（？（定义）
（？[^”]*+）#在开始引用之前，什么都吃
（？（？：[^“\\\r\n]++\\）*+）\句柄转义
)
/$1\\n$2/xg；

使用Perl 5.14.0（随安装）可以做到：

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;

#/usr/bin/env perl
严格使用；
使用警告；
使用5.14.0；
使用Regexp:：Common qw/分隔/；
我的$data=什么上下文中的双引号字符串？好吧，javascript，但我认为这不相关。我不需要完全解析，只需要识别字符串文本“Handling”\“
和\\”
可能意味着您希望字符串扩展两次。或者，您希望保留恰好放在结束“
”之前的反斜杠。由于除了“正确处理”之外，您没有提供任何所需的输出，我只能猜测“正确处理”是什么“对你来说意味着。@davka，你能发布示例内容来尝试吗？thanks@TLP，我知道，@Joel将编辑%$^#，当我在做这个时，tadmc和Qtax都比我先到了那里！记住使用utf8
作为正则表达式分隔符，请使用代码>！谢谢第一个解决方案出现以下错误：正则表达式中嵌套的量词；以@bdonlan标记，§
是拉丁文1，因此无需使用utf8（除非我猜这是您编码文件的方式）。@davka，更新您的perl，您使用的perl必须是古老的。或者您可以删除所有紧跟在量词后面的+
（例如+
，*
）。您为什么认为我（或OP）的默认编辑器字符集设置将使用拉丁语1？：）您可以找到一个包含几乎任何字符的非unicode字符集，但假设人们正在使用它不是一个好主意，尤其是当现在大多数发行版默认使用utf8时。找不到Regexp/Common.pm-我猜这是一个附加组件？
s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;

#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;

$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");

print "befor: {{$_}}\n";
s{($RE{quoted})}
 {  (my $x=$1) =~ s/\n/\\n/g;
    $x
 }ge;
print "after: {{$_}}\n";

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;