Regex 从文本文件中提取链接_Regex_Bash_Awk_Sed_Grep

Regex 从文本文件中提取链接

regex bash awk sed grep

Regex 从文本文件中提取链接,regex,bash,awk,sed,grep,Regex,Bash,Awk,Sed,Grep,我想构建一个提取第一个href属性值的bash脚本。这是一个相对的联系 <td style="width: 35px;"> <a class="productName" href="/prd/amaz/prd151" style="color: #000000;display: inline-block; overflow: hidden"> <font style="font-weight: bold; color

我想构建一个提取第一个href属性值的bash脚本。这是一个相对的联系

     <td style="width: 35px;">              
      <a class="productName" href="/prd/amaz/prd151" style="color: #000000;display: inline-block; overflow: hidden">
<font style="font-weight: bold; color: #4f88b2; margin-left: 0px; width: auto" class="product-name">Amaz Prd 151</font></a>                    <br>                    
<font style="font-size: 11px; color: #828585"> Product                   </font>                    <br>
<a href="https://www.myhomedb.com/id=151"><div class="activatedCount withover" title="<div style='color: #0691ca; line-height: 15px; font-size: 11px;'><b>7 Smart Home DB Users<br/></b>actually own this product<br/><br/><b>Click to view their playbooks</b></div>"><span class="icon-size-16 product-category-icon-user-count"></span><span> 7</span></div></a>            </td>

因此，如果使用下面的代码段，正确的输出将是“/prd/amaz/prd151”，因为这是第一个href标记之间的文本。文件中的所有其他内容都需要删除，因为我只需要相对链接

     <td style="width: 35px;">              
      <a class="productName" href="/prd/amaz/prd151" style="color: #000000;display: inline-block; overflow: hidden">
<font style="font-weight: bold; color: #4f88b2; margin-left: 0px; width: auto" class="product-name">Amaz Prd 151</font></a>                    <br>                    
<font style="font-size: 11px; color: #828585"> Product                   </font>                    <br>
<a href="https://www.myhomedb.com/id=151"><div class="activatedCount withover" title="<div style='color: #0691ca; line-height: 15px; font-size: 11px;'><b>7 Smart Home DB Users<br/></b>actually own this product<br/><br/><b>Click to view their playbooks</b></div>"><span class="icon-size-16 product-category-icon-user-count"></span><span> 7</span></div></a>            </td>




产品

我真的很感谢你在这方面的帮助，谢谢

约翰

和格雷普回头看

grep -oPm1 '(?<= href=")[^"]+' file

grep-oPm1'（？您尝试了什么？如果这是一个合适的HTML
文件，不要使用诸如grep
、awk
或sed
之类的非语法感知工具来解析它。使用语法感知解析器当然有一种方法可以用bash提取所有href标记。我见过sed“s/*href=\”\（.*\）\“*”*/\1/”等示例output.txt
但这似乎更为棘手。