Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/qt/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 有时代码运行,有时出错_Python_Beautifulsoup - Fatal编程技术网

Python 有时代码运行,有时出错

Python 有时代码运行,有时出错,python,beautifulsoup,Python,Beautifulsoup,下面是我用BeautifulSoup抓取网站的代码。该代码在windows上运行良好,但在ubuntu上有问题。在ubuntu中,代码有时运行,有时出错 错误如下: Traceback (most recent call last): File "Craftsvilla.py", line 22, in <module> source = requests.get(new_url) File "/usr/local/lib/python2.7/dist-packag

下面是我用BeautifulSoup抓取网站的代码。该代码在windows上运行良好,但在ubuntu上有问题。在ubuntu中,代码有时运行,有时出错

错误如下:

Traceback (most recent call last):
  File "Craftsvilla.py", line 22, in <module>
    source =  requests.get(new_url)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.craftsvilla.com', port=80): Max retries exceeded with url: /shop/01-princess-ayesha-cotton-salwar-suit-for-rudra-house/5601472 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6685fc3310>: Failed to establish a new connection: [Errno -2] Name or service not known',))
import requests
import lxml
from bs4 import BeautifulSoup
import xlrd
import xlwt

file_location = "/home/nitink/Python Linux/BeautifulSoup/Craftsvilla/Craftsvilla.xlsx"

workbook = xlrd.open_workbook(file_location)

sheet = workbook.sheet_by_index(0)

products = []
for r in range(sheet.nrows):
    products.append(sheet.cell_value(r,0))

book = xlwt.Workbook(encoding= "utf-8", style_compression = 0)
sheet = book.add_sheet("Sheet11", cell_overwrite_ok=True)

for index, url in enumerate(products):
    new_url = "http://www." + url
    source =  requests.get(new_url)
    data = source.content
    soup = BeautifulSoup(data, "lxml")

    sheet.write(index, 0, url)

    try:
        Product_Name = soup.select(".product-title")[0].text.strip()
        sheet.write(index, 1, Product_Name)

    except Exception:
        sheet.write(index, 1, "")

book.save("Craftsvilla Output.xls")
将以下链接另存为Craftsvilla.xlsx

craftsvilla.com/shop/01-princess-ayesha-cotton-salwar-suit-for-rudra-house/5601472
craftsvilla.com/shop/3031-pista-prachi/3715170
craftsvilla.com/shop/795-peach-colored-stright-salwar-suit/5608295
craftsvilla.com/catalog/product/view/id/5083511/s/dharm-fashion-villa-embroidery-navy-blue-slawar-suit-gown
注意:对于某些人来说,代码会运行,但会尝试一段时间。相同的代码会出现错误。不知道为什么???。同样的代码在windows上不会出现任何错误。

看起来您访问网站的次数太多,服务器拒绝了您的请求。在后续请求之间添加一个时间延迟:

import time

for index, url in enumerate(products):
    new_url = "http://www." + url
    source =  requests.get(new_url)
    data = source.content
    soup = BeautifulSoup(data, "lxml")

    # ...

    time.sleep(1)  # one second delay
看起来您访问该站点的次数太多,服务器拒绝了您的请求。在后续请求之间添加一个时间延迟:

import time

for index, url in enumerate(products):
    new_url = "http://www." + url
    source =  requests.get(new_url)
    data = source.content
    soup = BeautifulSoup(data, "lxml")

    # ...

    time.sleep(1)  # one second delay

我认为您在短时间内从同一IP地址发送了太多请求,因此服务器可能会拒绝您的连接。但为什么相同的代码在windows上从未出现错误。添加
print(new\u url)
new\u url
之后,我认为您阅读了xlsx文件并获得了不完整的数据。
pip安装pyopenssl
。有时这只是一个ssl错误,您的请求线路不断重试并失败。我认为您在短时间内从同一IP地址发送了太多请求,因此服务器可能会拒绝您的连接。但为什么相同的代码在windows上从不出错。在
新建url
之后添加
打印(新建url)
,我想你读了xlsx文件,得到了不完整的数据。
pip安装pyopenssl
。有时,这只是一个ssl错误,在这里,您的请求行不断重试并失败。