Curl 什么'；这是从网站下载所有图片的最快、最简单的方法_Curl_Wget

Curl 什么'；这是从网站下载所有图片的最快、最简单的方法

curl

Curl 什么'；这是从网站下载所有图片的最快、最简单的方法,curl,wget,Curl,Wget,从网站下载所有图片的最快最简单的方法是什么？更具体地说我在想一些类似wget或旋度的东西首先，我不知道如何完成这项任务。其次，我想看看wget或curl是否有一个更容易理解的解决方案。谢谢 ---更新@sarnold--- 谢谢你的回复。我想那也能奏效。然而，事实并非如此。以下是命令的输出： wget --mirror --no-parent http://www.cycustom.com/large/ --2012-01-10 18:19:36-- http://www.cycustom

从网站下载所有图片的最快最简单的方法是什么？更具体地说

我在想一些类似wget或旋度的东西

首先，我不知道如何完成这项任务。其次，我想看看wget或curl是否有一个更容易理解的解决方案。谢谢

---更新@sarnold---

谢谢你的回复。我想那也能奏效。然而，事实并非如此。以下是命令的输出：

wget --mirror --no-parent http://www.cycustom.com/large/
--2012-01-10 18:19:36--  http://www.cycustom.com/large/
Resolving www.cycustom.com... 64.244.61.237
Connecting to www.cycustom.com|64.244.61.237|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `www.cycustom.com/large/index.html'

    [  <=>                                                                                                                                                                                                                                  ] 188,795      504K/s   in 0.4s    

Last-modified header missing -- time-stamps turned off.
2012-01-10 18:19:37 (504 KB/s) - `www.cycustom.com/large/index.html' saved [188795]

Loading robots.txt; please ignore errors.
--2012-01-10 18:19:37--  http://www.cycustom.com/robots.txt
Connecting to www.cycustom.com|64.244.61.237|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174 [text/plain]
Saving to: `www.cycustom.com/robots.txt'

100%[======================================================================================================================================================================================================================================>] 174         --.-K/s   in 0s      

2012-01-10 18:19:37 (36.6 MB/s) - `www.cycustom.com/robots.txt' saved [174/174]

FINISHED --2012-01-10 18:19:37--
Downloaded: 2 files, 185K in 0.4s (505 KB/s)

--no parent

可防止其吞咽整个网站

啊，我看到他们放置了一个

robots.txt

请求robots不要从该目录下载照片：

$ curl http://www.cycustom.com/robots.txt
User-agent: *
Disallow: /admin/
Disallow: /css/
Disallow: /flash/
Disallow: /large/
Disallow: /pdfs/
Disallow: /scripts/
Disallow: /small/
Disallow: /stats/
Disallow: /temp/
$

wget（1）

没有记录任何可以忽略的方法

robots.txt

，我从来没有找到一种简单的方法来执行

curl（1）

中的

--mirror

。如果您想继续使用<代码> WGET（1）< /代码>，那么您需要在中间返回HTTP代理，该代码返回<代码> 404代码> > <代码>获取/机器人.txt < /COD>请求。我认为改变方法更容易。由于我想获得更多的使用经验，我想到了以下几点：

#/usr/bin/ruby
需要“打开uri”
需要“nokogiri”
doc=Nokogiri:：HTML（打开http://www.cycustom.com/large/"))
doc.css（'tr>td>a'）。每个do |链接|
name=link['href']
下一步除非name.match（/jpg/）
文件。打开（名称，“wb”）完成|
out.write（打开）http://www.cycustom.com/large/“+姓名））
结束
结束

这只是一个快速而肮脏的脚本——将URL嵌入两次有点难看。因此，如果这是为了长期生产使用，请先将其清理干净——或者找出如何使用

rsync（1）

。

通过添加以下选项可以忽略

robots.txt

文件：

-e robots=off

我还建议添加一个选项来降低下载速度，以限制服务器上的负载。例如，此选项在一个文件和下一个文件之间等待30秒：

--wait 30

@sarnold编辑了原始问题，以包含您建议的结果

-e robots=off

--wait 30