Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python URL在for循环中打开_Python_Python 3.x_Web Crawler_Urlopen - Fatal编程技术网

Python URL在for循环中打开

Python URL在for循环中打开,python,python-3.x,web-crawler,urlopen,Python,Python 3.x,Web Crawler,Urlopen,我正在尝试从网页中提取一些信息,我有以下代码: import re from math import ceil from urllib.request import urlopen as uReq, Request from bs4 import BeautifulSoup as soup InitUrl="https://mtgsingles.gr/search?q=" NumOfCrawledPages = 0 URL_Next = "" NumOfPages=5 for i in ra

我正在尝试从网页中提取一些信息,我有以下代码:

import re
from math import ceil
from urllib.request import urlopen as uReq, Request
from bs4 import BeautifulSoup as soup

InitUrl="https://mtgsingles.gr/search?q="
NumOfCrawledPages = 0
URL_Next = ""
NumOfPages=5

for i in range(0, NumOfPages):
    if i == 0:
        Url = InitUrl
    else:
        Url = URL_Next

    UClient = uReq(Url)  # downloading the url
    page_html = UClient.read()
    UClient.close()

    page_soup = soup(page_html, "html.parser")


    cards = page_soup.findAll("div", {"class": ["iso-item", "item-row-view"]})


    for card in cards:
        card_name = card.div.div.strong.span.contents[3].contents[0].replace("\xa0 ", "")

        if len(card.div.contents) > 3:
            cardP_T = card.div.contents[3].contents[1].text.replace("\n", "").strip()
        else:
            cardP_T = "Does not exist"

        cardType = card.contents[3].text
        print(card_name + "\n" + cardP_T + "\n" + cardType + "\n")


    try:
        URL_Next = "https://mtgsingles.gr" + page_soup.findAll("li", {"class": "next"})[0].contents[0].get("href")
        print("The next URL is: " + URL_Next + "\n")
    except IndexError:
        print("Crawling process completed! No more infomation to retrieve!")
    else:
        print("The next URL is: " + URL_Next + "\n")
        NumOfCrawledPages += 1
        Url= URL_Next

    finally:
        print("Moving to page : " + str(NumOfCrawledPages + 1) + "\n")
代码运行正常,没有出现错误,但结果与预期不符。我试图从页面以及下一页的url中提取一些信息。最终,我希望程序运行5次,抓取5页。但此代码会对给定的初始页面(InitUrl=“xyz.com”)进行5次爬网,并且不会在提取的下一个页面url中继续。 我尝试通过输入一些打印语句来调试它,以查看问题所在,我认为问题在于这些语句:

 UClient = uReq(Url) 
 page_html = UClient.read()
 UClient.close()
由于某些原因,urlopen在For循环中不会重复工作。
为什么会发生这种情况?

@Petris,请检查变量名,例如InitURL和InitURL,然后重试。如果您可以共享原始网站的链接,那么我们最好找出问题所在,以防您仍然存在此问题。此问题可能是您的最后一个else条款。它没有如果。因此,该块抛出一个错误,这意味着除了已经设置的值之外,Url从未设置过…我检查了变量名,它们都正常!当我在这里问这个问题时,我只是把它们错寄了。该网站是