当代码适用于使用python beautifulsoup的公司时,我收到了文本错误
从bs4导入BeautifulSoup 导入请求 r=请求。获取(“”) soup=BeautifulSoup(r.text,'lxml') 查找汤中的链接。查找所有('a',class='m'u company\u link'): href=links['href']当代码适用于使用python beautifulsoup的公司时,我收到了文本错误,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,从bs4导入BeautifulSoup 导入请求 r=请求。获取(“”) soup=BeautifulSoup(r.text,'lxml') 查找汤中的链接。查找所有('a',class='m'u company\u link'): href=links['href'] headers = {'User-Agent': 'Googleboat'} r = requests.get("https://www.yelu.in/"+href,headers = headers) soup = Beau
headers = {'User-Agent': 'Googleboat'}
r = requests.get("https://www.yelu.in/"+href,headers = headers)
soup = BeautifulSoup(r.text,'lxml')
company = {
"company_name" : soup.select_one('#company_name').text,
"address" : soup.select_one('div.text.location').text,
"phone" : soup.select_one('div.text.phone').text,
"mobile_phone" : soup.find('div',string = "Mobile
phone").find_next_sibling('div').text,
"fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
"website" : soup.find('div',string =
"Website").find_next_sibling('div').text,
"year" :soup.find('span',string = "Establishment year").next_sibling,
"employees" :soup.find('span',string = "Employees").next_sibling,
"manager" :soup.find('span',string = "Company manager").next_sibling
}
print(company)
我收到以下错误“回溯(最近一次呼叫):
文件“C:\Python27\yelu.py”,第14行,在
“公司名称”:汤。选择一个(“#公司名称”)。文本,
AttributeError:“非类型”对象没有属性“文本”
"
你的URL构造是错误的。摆脱双重身份
in/" + href
即:
如果您检查当前的响应,您将看到未找到页面
请参见以下内容的输出:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising')
soup = BeautifulSoup(r.text,'lxml')
for links in soup.find_all('a',class_='m_company_link'):
href = links['href']
try:
r = requests.get("https://www.yelu.in/" + href, headers = headers)
soup = BeautifulSoup(r.text,'lxml')
company = {
"company_name" : soup.select_one('#company_name').text,
"address" : soup.select_one('div.text.location').text,
"phone" : soup.select_one('div.text.phone').text,
"mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
"fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
"website" : soup.find('div',string = "Website").find_next_sibling('div').text,
"year" :soup.find('span',string = "Establishment year").next_sibling,
"employees" :soup.find('span',string = "Employees").next_sibling,
"manager" :soup.find('span',string = "Company manager").next_sibling
}
print(company)
except AttributeError as e:
print("https://www.yelu.in/" + href, r.status_code)
样本输出:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising')
soup = BeautifulSoup(r.text,'lxml')
for links in soup.find_all('a',class_='m_company_link'):
href = links['href']
try:
r = requests.get("https://www.yelu.in/" + href, headers = headers)
soup = BeautifulSoup(r.text,'lxml')
company = {
"company_name" : soup.select_one('#company_name').text,
"address" : soup.select_one('div.text.location').text,
"phone" : soup.select_one('div.text.phone').text,
"mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
"fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
"website" : soup.find('div',string = "Website").find_next_sibling('div').text,
"year" :soup.find('span',string = "Establishment year").next_sibling,
"employees" :soup.find('span',string = "Employees").next_sibling,
"manager" :soup.find('span',string = "Company manager").next_sibling
}
print(company)
except AttributeError as e:
print("https://www.yelu.in/" + href, r.status_code)