Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 如何从html文件中提取URL路径?(在bash中)_Regex_Bash_Url_Path - Fatal编程技术网

Regex 如何从html文件中提取URL路径?(在bash中)

Regex 如何从html文件中提取URL路径?(在bash中),regex,bash,url,path,Regex,Bash,Url,Path,我有一个文件URL list.html,里面有多个URL路径,格式如下: <body contenteditable="true"> <h1>File: <a href="https://test.com/Config.js" target="_blank" rel="nofollow noopener noreferrer">https://test.com/Config.js</a></h1> <

我有一个文件
URL list.html
,里面有多个URL路径,格式如下:

   <body contenteditable="true">
      <h1>File: <a href="https://test.com/Config.js" target="_blank" rel="nofollow noopener noreferrer">https://test.com/Config.js</a></h1>
      <div>
         <a href='/common/assets/locale/language_en.props' class='text'>/common/assets/locale/language_en.props</a>
         <div class='container'>                        urls: [e.get("app.content.domain") + "<span style='background-color:yellow'>/common/assets/locale/language_en.props</span>"]</div>
      </div>
      <div>
         <a href='/common/assets/locale/language_en1.props' class='text'>/common/assets/locale/language_en1.props</a>
         <div class='container'>                            remote: a + n + brandSuffix + "<span style='background-color:yellow'>>/common/assets/locale/language_en1.props</span>",</div>
      </div>
      <div>
         <a href='/common/assets/locale/language_en2.props' class='text'>/common/assets/locale/language_en2.props</a>
         <div class='container'>                            remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en2.props</span>",</div>
      </div>
      <div>
         <a href='/common/assets/locale/language_en2.props' class='text'>/common/assets/locale/language_en2.props</a>
         <div class='container'>                            remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en3.props</span>",</div>
      </div>
      <div>
         <a href='/common/assets/locale/language_en3.props' class='text'>/common/assets/locale/language_en3.props</a>
         <div class='container'>                            remote: a + n + "<span style='background-color:yellow'>>/common/assets/locale/language_en4.props</span>",</div>
      </div>
      <div>
     <a href='/main' class='text'>/main</a>
     <div class='container'>                    versionedAssets.isEnabled() &amp;&amp; (i = versionedAssets.getJSAsset("dashboard/boot"), r = versionedAssets.getJSAsset("dashboard<span style='background-color:yellow'>/main</span>"), l = versionedAssets.getJSAsset("appkit-utilities<span style='background-color:yellow'>/main</span>"), hybrid &amp;&amp; (i = versionedAssets.getHybridAsset("dashboard/boot"), r = versionedAssets.getHybridAsset("dashboard<span style='background-color:yellow'>/main</span>"))), envProps.get("app.blueJSVersion.enabled") ? (n.push([envProps.get("app.blueVendor.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blue.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blueApp.version") + "<span style='background-color:yellow'>/main</span>", envProps.get("app.blueView.version") + "<span style='background-color:yellow'>/main</span>", "blue-ui/dist/blue-ui/js<span style='background-color:yellow'>/main</span>", l, i, r]), n.push([{</div>
  </div>
有人能帮我吗

更新:我只需要带有黄色的URL路径。(背景色:黄色)

请尝试下面的代码

var href = window.location.href;
var dir = href.substring(0, href.lastIndexOf('/')) + "/";

以下内容可能会对您有所帮助

cat script.ksh
awk '/span/ && match($0,/<span style=\047background-color:yellow\047>>[^<]*/){print substr($0,RSTART+39,RLENGTH-39)}'  "$1"
cat script.ksh

awk'/span/&&match($0,/>[^[^[^您可以在bash中使用awk执行此操作:

awk -F'[ =]' '/href/ {print $3}' urls-list.html
说明:
-F告诉awk使用空格和“=”作为分隔符
/href/使打印命令在包含“/href/”的每一行上运行
print$3打印第三个令牌

但是,只有当输入行格式与您的示例中的格式完全相同时,这才有效。更可靠的方法是:

awk -F'href=' '/href/ {print $2}' urls-list.html | awk -F'[ <>]' '{print $1}'
awk-F'href='''/href/{print$2}'url-list.html | awk-F'[]'''{print$1}'

谢谢RavinderSingh13我尝试了这个,但它对我的html文件不起作用。我只是上传了一个新的文件输入。我只需要
href=
抱歉@RavinderSingh13这是我的错误。这是正确的输入。我很抱歉,我想补充一下,所有路径都在span标记内,背景颜色为
黄色它没有得到它们。这是一个很大的html文件,有很多不同的标签。我只需要
背景色:黄色
Din不工作它返回
div div div div div
谢谢@Paza你的bash可以工作。但是我没有得到我想要的结果。我只是上传了输入。这是我的错。你能帮我吗i@pancho,在您进行编辑后,我无法理解输入和输出之间的关系。现在请查看输入和输出
awk -F'[ =]' '/href/ {print $3}' urls-list.html
awk -F'href=' '/href/ {print $2}' urls-list.html | awk -F'[ <>]' '{print $1}'