Html 匹配sed中的任何字符（包括换行符）_Html_Coding Style_Replace_Sed_Newline

Html 匹配sed中的任何字符（包括换行符）

html coding-style replace sed

Html 匹配sed中的任何字符（包括换行符）,html,coding-style,replace,sed,newline,Html,Coding Style,Replace,Sed,Newline,我有一个sed命令，我想在一个巨大的、可怕的、丑陋的HTML文件上运行，该文件是由microsoftword文档创建的。它所要做的就是删除字符串的任何实例 style='text-align:center; color:blue; exampleStyle:exampleValue' 我试图修改的sed命令是 sed "s/ style='[^']*'//" fileA > fileB 它工作得很好，除了当匹配文本中有新行时，它就不匹配。是否有sed的修饰符，或者我可以做些什么来强制匹

我有一个sed命令，我想在一个巨大的、可怕的、丑陋的HTML文件上运行，该文件是由microsoftword文档创建的。它所要做的就是删除字符串的任何实例

style='text-align:center; color:blue;
exampleStyle:exampleValue'

我试图修改的sed命令是

sed "s/ style='[^']*'//" fileA > fileB

它工作得很好，除了当匹配文本中有新行时，它就不匹配。是否有sed的修饰符，或者我可以做些什么来强制匹配任何字符，包括换行符

我知道regexp在XML和HTML方面很糟糕，诸如此类，但在本例中，字符串模式的格式很好，样式属性总是以单引号开始，以单引号结束。因此，如果我能解决换行问题，我就可以用一个命令将HTML的大小减少50%以上

最后，结果证明，SinanÜnür的perl脚本工作得最好。这几乎是瞬间完成的，它将文件大小从2.3MB减少到了850k。好的ol'Perl…

sed

逐行检查输入文件，这意味着，据我所知，您想要的东西在

sed

中是不可能的

不过，您可以使用以下Perl脚本（未经测试）：

#/usr/bin/perl
严格使用；
使用警告；
{
本地$/#slurp模式
我的$html=；
$html=~s/style='[^']*'//g；
打印$html；
}
__结束__

一艘班轮将是：

$ perl -e 'local $/; $_ = <>; s/ style=\047[^\047]*\047//g; print' fileA > fileB

$perl-e'local$/；$\u=；s/style=\047[^\047]*\047//g；打印'fileA>fileB

您可以使用

tr

删除所有CR/LF，运行

sed

，然后导入自动格式化的编辑器。

sed逐行读取输入，因此在一行上进行处理并不简单。。。但这也不是不可能的，您需要利用sed分支。以下内容将起作用，我已经对其进行了注释以解释发生了什么（不是最可读的语法！）：

您可以尝试以下方法：

awk '/style/&&/exampleValue/{
    gsub(/style.*exampleValue\047/,"")
}
/style/&&!/exampleValue/{     
    gsub(/style.* /,"")
    f=1        
}
f &&/exampleValue/{  
  gsub(/.*exampleValue\047 /,"")
  f=0
}
1
' file

输出：

# more file
this is a line
    style='text-align:center; color:blue; exampleStyle:exampleValue'
this is a line
blah
blah
style='text-align:center; color:blue;
exampleStyle:exampleValue' blah blah....

# ./test.sh
this is a line

this is a line
blah
blah
blah blah....

I want to make this into one line

I also want to merge this line

另一种方式是：

$ cat toreplace.txt 
I want to make \
this into one line

I also want to \
merge this line

$ sed -e 'N;N;s/\\\n//g;P;D;' toreplace.txt

输出：

# more file
this is a line
    style='text-align:center; color:blue; exampleStyle:exampleValue'
this is a line
blah
blah
style='text-align:center; color:blue;
exampleStyle:exampleValue' blah blah....

# ./test.sh
this is a line

this is a line
blah
blah
blah blah....

I want to make this into one line

I also want to merge this line

加载另一行，

打印直到第一个换行的模式空间，

删除直到第一个换行的模式空间。

跨多行删除XML元素我的用例基本相同，但我需要匹配XML元素中的开始标记和结束标记，并完全删除它们——包括其中的任何内容

<xmlTag whatever="parameter that holds in the tag header">
    <whatever_is_inside/>
    <InWhicheverFormat>
        <AcrossSeveralLines/>
    </InWhicheverFormat>
</xmlTag>

一些解释：

我匹配
，因为我的XML元素包含参数

sed前面的内容是基于行的。这是这里的主要停车点。如果使用/g regex修饰符，可能会有一个命令行选项让它将文件作为一个“行”读取，但我怀疑（内存问题等）没有选项（据我所知）将文件作为一个“行”读取。我会使用Perl来实现这一点。但是sed确实有办法将新行添加到模式空间和保持空间中，因此可以在sed中进行多行处理——这并不漂亮。（我将您的答案合并到问题中；如果Sinan的回答回答了您的问题，那么单击“勾号”将其标记为已回答）这是我对答案的投票。语言的发展是sed->awk->C/C++/Ada。从左边开始，然后向右移动，直到你有足够的能量来完成这项工作。可能不是c/c++/Ada。依我看，可能是Python/Perl/Ruby等，至少对于系统管理任务是这样。
```
 sed -s --in-place=.back -e '/$^[ ]*$<xmlTag/{  # whenever you encounter the xmlTag
       $! {                                       # do
            :begin                                # label to return to
            N;                                    # append next line
            s/$^[ ]*$<$xmlTag$[^·]\+<\/\2>//; # Attempt substitution (elimination) of pattern
            t end                                 # if substitution succeeds, jump to :end
            b begin                               # unconditional jump to :begin to append yet another line
            :end                                  # label to mark the end
          }
       }'  myxmlfile.xml
```