Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/user-interface/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 无法识别链接类_Python 2.7_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 2.7 无法识别链接类

Python 2.7 无法识别链接类,python-2.7,web-scraping,beautifulsoup,Python 2.7,Web Scraping,Beautifulsoup,我对编程和Python非常陌生,我正在尝试编写这个简单的刮刀,从这个页面中提取治疗师的所有个人资料URL 导入请求 从bs4导入BeautifulSoup def tru_爬虫程序(最大页数): p='&page=' 页码=1 当前页面时,您拥有的findAll()参数没有意义。它的内容是:查找所有作为参数: for link in soup.select('div.member-summary h2 a'): href = 'http://www.therapy-directory.

我对编程和Python非常陌生,我正在尝试编写这个简单的刮刀,从这个页面中提取治疗师的所有个人资料URL

导入请求
从bs4导入BeautifulSoup
def tru_爬虫程序(最大页数):
p='&page='
页码=1

当前页面时,您拥有的
findAll()
参数没有意义。它的内容是:查找所有
作为参数:

for link in soup.select('div.member-summary h2 a'):
    href = 'http://www.therapy-directory.org.uk' + link.get('href')
    yield href + '\n'
    print(href)
上面的CSS选择器读取:查找类等于“成员摘要”的
标记,然后在该
标记中查找
标记,然后在该
标记中查找
标记

工作示例:

import requests
from bs4 import BeautifulSoup

p = '&page='
page = 1
url = 'http://www.therapy-directory.org.uk/search.php?search=Sheffield&distance=40&services[23]=on&services=23&business_type[individual]=on&uqs=626693' + p + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.select('div.member-summary h2 a'):
    href = 'http://www.therapy-directory.org.uk' + link.get('href')
    print(href)
输出(已修剪,共26个链接):


谢谢,但它仍然没有返回任何内容:(@pb_ng hmm..对我有效(打印了大量链接)。请参阅更新的答案,了解我是如何尝试hanks的,因此删除了“yield href+”\n让它工作。如果你不介意我问,为什么在使用Yield时,它没有返回任何东西?还有一件事,我注意到While循环没有工作。它没有转到第2页并获取链接。我如何修复它?这样做了吗,仍然只刮到第1页。请检查一下?非常感谢!
Traceback (most recent call last):
File "C:/Users/PB/PycharmProjects/crawler/crawler-revised.py", line    19,      enter code here`in <module>
tru_crawler(3)
File "C:/Users/PB/PycharmProjects/crawler/crawler-revised.py", line 9, in tru_crawler
code = requests.get(url)
File "C:\Python27\lib\requests\api.py", line 68, in get
return request('get', url, **kwargs)
File "C:\Python27\lib\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\requests\sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\requests\sessions.py", line 602, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python27\lib\requests\sessions.py", line 195, in resolve_redirects
allow_redirects=False,
File "C:\Python27\lib\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\requests\adapters.py", line 415, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.',  BadStatusLine("''",))
for link in soup.select('div.member-summary h2 a'):
    href = 'http://www.therapy-directory.org.uk' + link.get('href')
    yield href + '\n'
    print(href)
import requests
from bs4 import BeautifulSoup

p = '&page='
page = 1
url = 'http://www.therapy-directory.org.uk/search.php?search=Sheffield&distance=40&services[23]=on&services=23&business_type[individual]=on&uqs=626693' + p + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.select('div.member-summary h2 a'):
    href = 'http://www.therapy-directory.org.uk' + link.get('href')
    print(href)
http://www.therapy-directory.org.uk/therapists/lesley-lister?uqs=626693
http://www.therapy-directory.org.uk/therapists/fiona-jeffrey?uqs=626693
http://www.therapy-directory.org.uk/therapists/ann-grant?uqs=626693
.....
.....
http://www.therapy-directory.org.uk/therapists/jan-garbutt?uqs=626693