Linux 如何在shell变量中获取网页的内容？_Linux_Bash_Shell_Wget

Linux 如何在shell变量中获取网页的内容？

linux bash shell

Linux 如何在shell变量中获取网页的内容？,linux,bash,shell,wget,Linux,Bash,Shell,Wget,在Linux中，如何获取URL并在shell脚本中的变量中获取其内容您可以使用wget命令下载页面并将其读入变量，如下所示： content=$(wget google.com -q -O -) echo $content 我们使用wget的-O选项，该选项允许我们指定wget将页面内容转储到的文件的名称。我们指定-以将转储获取到标准输出，并将其收集到变量content中。您可以添加-qquiet选项来关闭wget输出您可以使用该命令执行此操作，也可以执行以下操作： content=$(c

在Linux中，如何获取URL并在shell脚本中的变量中获取其内容

您可以使用

wget

命令下载页面并将其读入变量，如下所示：

content=$(wget google.com -q -O -)
echo $content

我们使用

wget

的

-O

选项，该选项允许我们指定

wget

将页面内容转储到的文件的名称。我们指定

以将转储获取到标准输出，并将其收集到变量

content

中。您可以添加

-q

quiet选项来关闭wget输出

您可以使用该命令执行此操作，也可以执行以下操作：

content=$(curl -L google.com)
echo $content

我们需要使用

-L

选项，因为我们请求的页面可能已移动。在这种情况下，我们需要从新位置获取页面。

-L

或

-location

选项有助于我们实现这一点。

有

wget

命令或

curl

您现在可以使用与wget一起下载的文件。或者您可以使用curl处理流

资源：

您可以使用

curl

或

wget

检索原始数据，也可以使用

w3m-dump

对网页进行良好的文本表示

$ foo=$(w3m -dump http://www.example.com/); echo $foo
You have reached this web page by typing "example.com", "example.net","example.org" or "example.edu" into your web browser. These domain names are reserved for use in documentation and are not available for registration. See RFC 2606, Section 3.

有很多方法可以从命令行获取页面。。。但这也取决于您是想要代码源还是页面本身：

如果您需要代码源：

带卷曲：

curl $url

与工作组：

wget -O - $url

但是，如果您想通过浏览器获得所能看到的内容，lynx可能非常有用：

lynx -dump $url

我认为您可以找到很多解决这个小问题的方法，也许您应该阅读这些命令的所有手册页。别忘了用你的url替换

$url

）

祝您好运：）

如果您已经安装了，它将提供一个简单命名为“”的二进制文件

$GEThttp://example.com 示例网页您已通过键入“example.com”到达此网页， “example.net”、“example.org” 或“example.edu”输入到您的web浏览器中

这些域名保留在文档中使用，不可用注册。见第3节

wget-O-

，

curl

和

lynx-source

的行为类似。

没有curl，没有wget，没有ncat，什么都没有？使用

telnet

：

$ content=$(telnet localhost 80)
GET / HTTP/1.1
Host: localhost
Connection: close
 
Connection closed by foreign host.

@rjack：（但是您链接到的文章确实为$（…）语法提供了一个很好的例子。）这是一个非常巧妙的技巧。我通过代理服务器上的php脚本调用一个shell脚本。当被问到时，代理服务器会打开昂贵的服务器，并在2小时后自动关闭。我需要wget的输出作为标准输出反馈给Jenkins控制台记录。我还没有得到这个…有人能说明如何获得一个n此链接的变量中的img标记？？？@juggernaut1996:这应该是一个单独的问题。简单地说，您必须下载页面，提取正确元素的

src

属性，然后下载该页面。如果您安装，此命令应执行此操作：

curl-shttp://ww1.watchop.io/manga2/read/one-piece/1/4 |tq-j-a src“#img将img“| xargs wget

wget 1.14版本不接受使用

-O-

选项的

convert\u links=on

。它会失败，出现错误

-k只能在输出到常规文件时与-O一起使用。

。这是预期的吗？如果我是你，我会双重引用url。 $ GET http://example.com <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=utf-8"> <TITLE>Example Web Page</TITLE> </HEAD> <body> <p>You have reached this web page by typing "example.com", "example.net","example.org&quot or "example.edu" into your web browser.</p> <p>These domain names are reserved for use in documentation and are not available for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 2606</a>, Section 3.</p> </BODY> </HTML>

$ content=$(telnet localhost 80)
GET / HTTP/1.1
Host: localhost
Connection: close
 
Connection closed by foreign host.

$ echo $content
HTTP/1.1 200 OK Date: Mon, 22 Mar 2021 12:45:02 GMT Server:
Apache/2.4.46 (Fedora) OpenSSL/1.1.1j Last-Modified: Mon, 31 Dec 2018
15:56:45 GMT ETag: "a4-57e5375ad21bd" Accept-Ranges: bytes
Content-Length: 164 Connection: close Content-Type: text/html;
charset=UTF-8 Success! 192.168.1.1