如何使用wget/curl下载给定网页上.zip文件的所有链接？_Curl_Download_Wget

如何使用wget/curl下载给定网页上.zip文件的所有链接？

curl download

如何使用wget/curl下载给定网页上.zip文件的所有链接？,curl,download,wget,Curl,Download,Wget,一个页面包含指向一组.zip文件的链接，我想下载所有这些文件。我知道这可以通过wget和curl实现。它是如何完成的？命令是： wget -r -np -l 1 -A zip http://example.com/download/ 选项含义： -r, --recursive specify recursive download. -np, --no-parent don't ascend to the parent directory. -l, --l

一个页面包含指向一组.zip文件的链接，我想下载所有这些文件。我知道这可以通过wget和curl实现。它是如何完成的？

命令是：

wget -r -np -l 1 -A zip http://example.com/download/

选项含义：

-r,  --recursive          specify recursive download.
-np, --no-parent          don't ascend to the parent directory.
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
-A,  --accept=LIST        comma-separated list of accepted extensions.

-r            recursive
-l1           maximum recursion depth (1=use only this directory)
-H            span hosts (visit other hosts in the recursion)
-t1           Number of retries
-nd           Don't make new directories, put downloaded files in this one
-N            turn on timestamping
-A.mp3        download only mp3s
-erobots=off  execute "robots.off" as if it were a part of .wgetrc

上述解决方案对我不起作用。对我来说，只有这一个有效：

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off [url of website]

选项含义：

-r,  --recursive          specify recursive download.
-np, --no-parent          don't ascend to the parent directory.
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
-A,  --accept=LIST        comma-separated list of accepted extensions.

-r            recursive
-l1           maximum recursion depth (1=use only this directory)
-H            span hosts (visit other hosts in the recursion)
-t1           Number of retries
-nd           Don't make new directories, put downloaded files in this one
-N            turn on timestamping
-A.mp3        download only mp3s
-erobots=off  execute "robots.off" as if it were a part of .wgetrc

对于具有一些并行魔力的其他场景，我使用：

curl [url] | grep -i [filending] | sed -n 's/.*href="\([^"]*\).*/\1/p' |  parallel -N5 wget -

如果您不想创建任何额外的目录（即，所有文件都将位于根文件夹中），那么

-nd

（无目录）标志非常方便。我如何调整此解决方案，使其从给定页面更深入？我尝试了-L20，但wget立即停止。如果文件与起始URL不在同一目录中，您可能需要删除

-np

。如果它们位于不同的主机上，您需要

--span host

。是否有办法保留网站的目录结构，但仅排除根文件夹，以便当前目录直接为网站的根文件夹，而不是带有网站URL名称的文件夹？来源：是的，谢谢！我不记得它是从哪里来的，只是在我的剧本里。不知道对不起。提出一个新问题！；）+1用于

-H

开关。这就是阻止第一个答案（这是我在看之前尝试过的）起作用的原因。不，你在2013-09-10回答了这个问题。