Regex 在HTML中过滤文本和打印已解析的过滤器_Regex_Sed_Awk

Regex 在HTML中过滤文本和打印已解析的过滤器

regex sed awk

Regex 在HTML中过滤文本和打印已解析的过滤器,regex,sed,awk,Regex,Sed,Awk,我有一个OPML文件，我想解析链接和名称，以便创建一个HTML格式的列表 <outline text="Wired Features" type="rss" xmlUrl="http://downloads.wired.com/podcasts/xml/features.xml?_kip_ipx=1854665749-1310493405" htmlUrl="http://www.wired.com" /> <outline text="ArcSight Podcasts" t

我有一个OPML文件，我想解析链接和名称，以便创建一个HTML格式的列表

<outline text="Wired Features" type="rss" xmlUrl="http://downloads.wired.com/podcasts/xml/features.xml?_kip_ipx=1854665749-1310493405" htmlUrl="http://www.wired.com" />
<outline text="ArcSight Podcasts" type="rss" xmlUrl="http://www.arcsight.com/podcasts/itunes/" htmlUrl="http://www.arcsight.com" />

使用SED或类似的工具，我希望将项目打印在各自的HTML输出中，即

<a href="http://downloads.wired.com/podcasts/xml/features.xml?_kip_ipx=1854665749-1310493405" title="http://www.wired.com">Wired Features</a>

perl-nle'
（$text）=/text=“（.*？”/；
（$url）=/xmlUrl=（“*？”）/；
（$title）=/htmlUrl=（“*？”）/；
//和printf“\n”，
$url、$title、$text；
“填充

假设感兴趣的部分中没有嵌入换行符

与：

xgawk-lxml'XMLSTARTELEM{
printf“\n”，
q XMLATTR[“xmlUrl”]q，q XMLATTR[“htmlur”]q，XMLATTR[“text”]
}“q=\”填充

编辑：可以使用单个正则表达式重写Perl解决方案：

perl -nle'
  /text="(.*?)".*xmlUrl=(".*?").*htmlUrl=(".*?")/
    and printf "<a href=%s title=%s>%s</a>\n",
     $2, $3, $1; 
  ' infile

perl-nle'
/text=“（.*？”。*xmlUrl=（“*？”）.*htmlUrl=（“*？”）/
和printf“\n”，
$2, $3, $1; 
“填充

此sed解决方案可能有效：

sed 's/^<outline text="\([^"]*\)" type="rss" xmlUrl=\("[^"]*"\) htmlUrl=\("[^"]*"\) \/>/<a href=\2 title=\3>\1<\/a>/' input_file

sed的/^它打印了这一行，但没有按照我想要的格式进行解析和重新打印。谢谢。
perl -nle'
  /text="(.*?)".*xmlUrl=(".*?").*htmlUrl=(".*?")/
    and printf "<a href=%s title=%s>%s</a>\n",
     $2, $3, $1; 
  ' infile 

sed 's/^<outline text="\([^"]*\)" type="rss" xmlUrl=\("[^"]*"\) htmlUrl=\("[^"]*"\) \/>/<a href=\2 title=\3>\1<\/a>/' input_file