Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup无法检索网页链接_Python_Web Scraping_Beautifulsoup_Python Requests_Web Crawler - Fatal编程技术网

Python BeautifulSoup无法检索网页链接

Python BeautifulSoup无法检索网页链接,python,web-scraping,beautifulsoup,python-requests,web-crawler,Python,Web Scraping,Beautifulsoup,Python Requests,Web Crawler,我试图检测网站列表页面的URL,但BeautifulSoup无法做到这一点。我得到以下异常,即使我尝试使用标题 Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request six.raise_from(e, None) File "<string>", line

我试图检测网站列表页面的URL,但BeautifulSoup无法做到这一点。我得到以下异常,即使我尝试使用标题

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 317, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='www.sahibinden.com', port=80): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/soner/PycharmProjects/bitirme2/main.py", line 8, in <module>
    r = requests.get(url)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='www.sahibinden.com', port=80): Read timed out. (read timeout=None)

Process finished with exit code 1

请注意,如果您看到自己被阻止访问网站Hibinden,这是可能的。我还没有研究BeautifulSoup在代理列表中的用法。

这是我运行的代码片段,它按预期工作:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

url = 'http://www.sahibinden.com/satilik/istanbul-kartal?pagingOffset=50&pagingSize=50'

r = requests.get(url, headers=headers)
if r.ok:
    soup = BeautifulSoup(r.text, 'lxml')
    for a in soup('a', 'classifiedTitle'):
        print(a.get('href'))
下面是上面代码的输出:

/ilan/emlak konut satilik directten%2指定wc li%2 GENIS-m2de%2伊斯坎利%2 Culasimi-kolay-sik-3-plus1-671049902/德泰 /ilan/emlak-konut-satilik-nesrin-den-kartal-ugurmumcuda-satilik-3-plus1-yunus-emre-caddesinde-692133846/detay /ilan/emlak-konut-satilik-akelden-karliktepe-de-genis-m2-li-krediye-Uquiun-daire-659458837/detay /ilan/emlak-konut-satilik-ikea-ve-metro-yani-teknik-yapi-uprise-elite_mukemmel-firsat-3-plus1-692131163/detay /ilan/emlak-konut-satilik-kartal-atalar-da-iskanli-5-plus1-dubleks-satilik-daire-692125302/detay /ilan/emlak-konut-satilik-satilik-daire-kartal-atalar-da-2-plus1-lux-100-m2-671083034/detay /ilan/emlak-konut-satilik-kartal-ugurmumcuda-3-plus1-genis-masrafsiz-satilik-daire-681180607/detay /ilan/emlak-konut-satilik-soner-den-manzara-adalar-da-satilik-Kacirillilmayacak-kelepir-daire-653973723/detay /ilan/emlak-konut-satilik-mertcan-dan-tarihi-ayazma-caddesinde-2-plus1-satilik-ters-dubleks-692122837/detay /ilan/emlak konut satilik cinar emlak%2Ctan-hurriyet-mah-105-m2-toprak-tapulu-692117031/detay /ilan/emlak-konut-satilik-kartal-cumhuriyet-te-arsa-hisseli-yuksek-giris-daire-692116930/detay /ilan/emlak-konut-satilik-temiz-emlaktan-petroliste-2-plus1-satilik-sifir-deniz-manzarali-671086029/detay /ilan/emlak-konut-satilik-cemal-yalcin-dan-ozel-mimarili-luks-satilik-dubleks-623158476/detay /ilan/emlak-konut-satilik-la-marin-kartal-da-site-icerisinde-ozel-bahce-kati-sifir-daire-645480180/detay /ilan/emlak-konut-satilik-sen-kardeslerden-merkezde-3-plus1%2Ccok temiz satilik daire%2C350.000tl-692103788/detay /ilan/emlak-konut-satilik-kartal-petrol-is-mah-de-3-plus1-deniz-manzarali-yatirimlik-daire-619762304/detay /ilan/emlak-konut-satilik-remax-red-rukiye-korkmaz-dan-panorama-velpark-ta-esyali-1-plus1-616596826/detay /ilan/emlak-konut-satilik-yakacik-Demerli-twinstar-sitesi-ultra-luks-174-m2-3-plus1-daire-692104680/detay /ilan/emlak-konut-satilik-kartal-soganlikta-yatirimlik-kiracili-firsat-2-plus1-daire-682793715/detay /ilan/emlak-konut-satilik-Istmarinda-Deverli-taksitli-satilik-studyo-gulsen-yanmazdan-638548163/detay /ilan/emlak-konut-satilik-sahibinden-satilik-kartal-merkezde-kaymakamligin-karsisinda-2-plus1-692054497/detay /ilan/emlak konut satilik petrolis(石油公司)ara-kat-2-plus1-110-m2-lux-panjurlu)carsiya-yakin-692100683/detay /ilan/emlak-konut-satilik-ful-deniz-manzarali-3-plus1-ana-yola-cok-yakin-115m2-sifir-daire-585807696/detay /ilan/emlak-konut-satilik-kartal-karlitepe-de-ters-dublek-2-plus2-satilik-daire-692085141/detay /ilan/emlak-konut-satilik-kartal-dap-yapi-istmarina-full-deniz-manzarali-2-plus1-satilik-621795699/detay /ilan/emlak-konut-satilik-aybars-dan-site-icinde-havuzlu-satilik-daire-671063936/detay /ilan/emlak-konut-satilik-soganlik-yeni-mah-5-yillik-binada-adalar-manzarali-satilik-dair-679308838/detay /ilan/emlak-konut-satilik-kartal-soganlik-orta-mah-e-5-yani-yeni-bina-kelepir-daire-573785719/detay /ilan/emlak-konut-satilik-sahibinden-site-icerisinde-1-plus1-644746509/detay /伊兰/emlak-konut-satilik-3-plus1-luks-sitede-646420303/detay /ilan/emlak-konut-satilik-mirac-dan-ayazma-koru-da-lux-yapili-3-plus1-135m2-masrafsiz-daire-535382195/detay /ilan/emlak-konut-satilik-sahibinden-site-icerisinde-3-plus1-644729603/detay /ilan/emlak-konut-satilik-cevizli-de-satilik-daire-2-plus1-lux-85-m2-671030197/detay /ilan/emlak-konut-satilik-esentepe-de-bahceli-acik-otoparkli-125m2-ferah-kullansli-daire-670847710/detay /ilan/emlak-konut-satilik-atalarda-ara-katta-sifir-binada-2-plus1-85-m2-otoparkli-510436215/detay /ilan/emlak-konut-satilik-sahil-mesa-marmara-10.kat-122m2-deniz-manzarali-0-satilik-3-plus1-692085951/detay /ilan/emlak-konut-satilik-kartal-da-sifir-ara-kat-3-plus1-satilik-daire-692090351/detay /ilan/emlak-konut-satilik-pega-kartal-satis-ofisinden-2-plus1-kat-mulkiyetli-hemen-teslim-644626657/detay /ilan/emlak-konut-satilik-adalilar-dan-kartal-hurriyet-mah-de-satilik-kelepir-3-plus1-dublex-682761629/detay /ilan/emlak-konut-satilik-kartal-kordonboyunda-2-plus1-sifir-daire-647037679/detay /ilan/emlak-konut-satilik-aklife-den_yakacik_carsi_mah_ultra_lux_katta_tek_sifir_2-plus1-654883140/detay /ilan/emlak-konut-satilik-aklife-den_yakacik_da_mukanbel_yapi_kaliteli_3-plus1_arakat_sifir-657772595/detay /ilan/emlak-konut-satilik-ciceksan-insaat-dan-3-plus1-daireler-hemen-tapu-hemen-teslim-682770303/detay /ilan/emlak-konut-satilik-satilik-daire-ofis-2-1-85-mt-klepir-634724740/detay /ilan/emlak konut satilik ricar dan%2C7-24-guvenlik%2Cyuzme havuzu%2Ckapali otopark%2Csifir%2Csitede-682744629/detay /伊兰/埃姆拉克·科努特·萨蒂利克- 里卡丹%2卡德乌泽里%2基尼%2费拉%2西弗%2勒克斯%2卡拉-kat-649504313/日 /ilan/emlak-konut-satilik-mertcan-dan-e5-e-Yurme-mesafesinde-iskanli-2-plus1-sifir-daire-692078490/detay /ilan/emlak-konut-satilik-kartal-atalar-da-sahile-Yurme-mesafesinde-iskanli-masrafsiz-3-plus1-454709956/detay /ilan/emlak-konut-satilik-tugcan-pala-dan-mesa-kartall-da-satilik-2-kat-buyuk-tip-2-plus1-670434988/detay /ilan/emlak-konut-satilik-satilik-sifir-daire-soganlik-yeni-mah-2-plus1-kat-mulkiyetli-682522237/detay
你能发布引发异常的代码吗?@JackFleeting我已经发布了。这不会导致您的计算机出现异常吗?问题不在于bs4,而在于您试图与url建立的连接。您可以在自己的代码中找到答案。设置并将标题传递给requests.get。只需在此处运行它,它就能按预期工作。
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

url = 'http://www.sahibinden.com/satilik/istanbul-kartal?pagingOffset=50&pagingSize=50'

r = requests.get(url, headers=headers)
if r.ok:
    soup = BeautifulSoup(r.text, 'lxml')
    for a in soup('a', 'classifiedTitle'):
        print(a.get('href'))