Wget：如果文件已经存在，是否跳过下载？_Wget

Wget：如果文件已经存在，是否跳过下载？

Wget：如果文件已经存在，是否跳过下载？,wget,Wget,回答说使用-nc，或--无障碍，但-nc不会阻止发送HTTP请求和随后下载文件。如果文件已经被完全检索，那么它在下载文件后不会做任何事情。如果文件已经存在，是否有任何方法阻止发出HTTP请求我安装了。运行下面的命令后，wget对已经存在的每个文件发出类似于的HTTP请求，似乎要下载它，然后说类似于：文件已检索，无事可做 wget --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12

回答说使用

-nc

，或

--无障碍，但-nc
不会阻止发送HTTP请求和随后下载文件。如果文件已经被完全检索，那么它在下载文件后不会做任何事情。如果文件已经存在，是否有任何方法阻止发出HTTP请求
我安装了。运行下面的命令后，wget
对已经存在的每个文件发出类似于的HTTP请求
，似乎要下载它，然后说类似于：文件已检索，无事可做

wget --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12' \
     --tries=1 \
     --no-clobber \
     --continue \
     --wait=0.3 \
     --random-wait \
     --adjust-extension \
     --load-cookies cookies.txt \
     --save-cookies cookies.txt \
     --keep-session-cookies \
         --recursive \
         --level=inf \
         --convert-links \
         --page-requisites \
         --reject=edit,logout,rate \
         --domains=example.com,s3.amazonaws.com \
         --span-hosts \
         --exclude-directories=/admin \
     http://example.com/

如果您使用的选项不兼容，我会在wget 1.16 linux上收到以下警告：
$ wget --no-clobber --convert-links http://example.com
Both --no-clobber and --convert-links were specified, only --convert-links will be used.

-nc
选项可以满足您的要求，至少在wget 1.19.1中是这样

在我的服务器上，我有一个名为index.html
的文件，其中包含指向a.html
和b.html
的链接
$ wget -r -nc http://127.0.0.1:8000/

服务器日志显示：
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /a.html HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /b.html HTTP/1.1" 200 -

127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /b.html HTTP/1.1" 200 -

现在我删除b.html
并再次运行它：
$ rm 127.0.0.1\:8000/b.html
$ wget -r -nc http://127.0.0.1:8000/

服务器日志显示：
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /a.html HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /b.html HTTP/1.1" 200 -

127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /b.html HTTP/1.1" 200 -

正如您所看到的，只请求了b.html
。
实际上，它不做任何请求，甚至不做头部大小比较。尝试并/或读取源代码：