Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/codeigniter/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 如何获取所有软件链接?_Python 2.7_Beautifulsoup - Fatal编程技术网

Python 2.7 如何获取所有软件链接?

Python 2.7 如何获取所有软件链接?,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我有以下代码: import urllib import urlparse from bs4 import BeautifulSoup url = "http://www.downloadcrew.com/?act=search&cat=51" pageHtml = urllib.urlopen(url) soup = BeautifulSoup(pageHtml) for a in soup.select("div.productListingTitle a[href]"):

我有以下代码:

import urllib
import urlparse
from bs4 import BeautifulSoup

url = "http://www.downloadcrew.com/?act=search&cat=51"
pageHtml = urllib.urlopen(url)
soup = BeautifulSoup(pageHtml)

for a in soup.select("div.productListingTitle a[href]"):
    try:
        print (a["href"]).encode("utf-8","replace")
    except:
        print "no link"

        pass

但当我运行它时,我只得到20个链接。输出应该超过20个链接。

因为您只下载内容的第一页

只需使用循环即可加载所有页面:

import urllib
import urlparse
from bs4 import BeautifulSoup

for i in xrange(3):
    url = "http://www.downloadcrew.com/?act=search&page=%d&cat=51" % i
    pageHtml = urllib.urlopen(url)
    soup = BeautifulSoup(pageHtml)

    for a in soup.select("div.productListingTitle a[href]"):
        try:
            print (a["href"]).encode("utf-8","replace")
        except:
            print "no link"
如果你不知道页数,你可以

import urllib
import urlparse
from bs4 import BeautifulSoup

i = 0
while 1:
    url = "http://www.downloadcrew.com/?act=search&page=%d&cat=51" % i
    pageHtml = urllib.urlopen(url)
    soup = BeautifulSoup(pageHtml)

    has_more = 0
    for a in soup.select("div.productListingTitle a[href]"):
        has_more = 1
        try:
            print (a["href"]).encode("utf-8","replace")
        except:
            print "no link"
    if has_more:
        i += 1
    else:
        break
我在我的电脑上运行它,它得到了三页的60个链接。

祝你好运~

为什么会有超过20个链接?每页只有20个链接。@Blorgbeard,因为底部仍然有很多页面。您只下载了第一页。您必须循环浏览所有这些内容。@Blorgbeard我该怎么做?单击页面链接并查看地址。里面大概有点像
&page=123
。因此,从1循环到页数,并生成所有页面URL,下载每个页面。如果有3个页面..如果不知道整个页面呢?您可以编写一个while循环并中断util,这样就不会得到任何结果。例如:在我的回答中,谢谢你。它起作用了!。我忘了在%d中没有写入d,这就是为什么while循环不起作用。很高兴听到:),如果它对您有效,您可以接受答案,thx。