当代码适用于使用python beautifulsoup的公司时,我收到了文本错误

当代码适用于使用python beautifulsoup的公司时,我收到了文本错误,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,从bs4导入BeautifulSoup 导入请求 r=请求。获取(“”) soup=BeautifulSoup(r.text,'lxml') 查找汤中的链接。查找所有('a',class='m'u company\u link'): href=links['href'] headers = {'User-Agent': 'Googleboat'} r = requests.get("https://www.yelu.in/"+href,headers = headers) soup = Beau

从bs4导入BeautifulSoup 导入请求

r=请求。获取(“”) soup=BeautifulSoup(r.text,'lxml')

查找汤中的链接。查找所有('a',class='m'u company\u link'): href=links['href']

headers = {'User-Agent': 'Googleboat'}
r = requests.get("https://www.yelu.in/"+href,headers = headers)
soup = BeautifulSoup(r.text,'lxml')
company = {
    "company_name" : soup.select_one('#company_name').text,
    "address" : soup.select_one('div.text.location').text,
    "phone" : soup.select_one('div.text.phone').text,
    "mobile_phone" : soup.find('div',string = "Mobile 
     phone").find_next_sibling('div').text,
    "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
    "website" : soup.find('div',string = 
    "Website").find_next_sibling('div').text,
    "year" :soup.find('span',string = "Establishment year").next_sibling,
    "employees" :soup.find('span',string = "Employees").next_sibling,
    "manager" :soup.find('span',string = "Company manager").next_sibling
}
print(company)
我收到以下错误“回溯(最近一次呼叫): 文件“C:\Python27\yelu.py”,第14行,在 “公司名称”:汤。选择一个(“#公司名称”)。文本, AttributeError:“非类型”对象没有属性“文本”

"


你的URL构造是错误的。摆脱双重身份

in/" + href
即:

如果您检查当前的响应,您将看到未找到页面

请参见以下内容的输出:

from bs4 import BeautifulSoup 
import requests

headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising') 
soup = BeautifulSoup(r.text,'lxml')

for links in soup.find_all('a',class_='m_company_link'): 
    href = links['href']
    try:
        r = requests.get("https://www.yelu.in/" + href, headers = headers)
        soup = BeautifulSoup(r.text,'lxml')
        company = {
        "company_name" : soup.select_one('#company_name').text,
        "address" : soup.select_one('div.text.location').text,
        "phone" : soup.select_one('div.text.phone').text,
        "mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
        "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
        "website" : soup.find('div',string = "Website").find_next_sibling('div').text,
        "year" :soup.find('span',string = "Establishment year").next_sibling,
        "employees" :soup.find('span',string = "Employees").next_sibling,
        "manager" :soup.find('span',string = "Company manager").next_sibling
         }
        print(company)
    except AttributeError as e:
        print("https://www.yelu.in/" + href, r.status_code)

样本输出:

from bs4 import BeautifulSoup 
import requests

headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising') 
soup = BeautifulSoup(r.text,'lxml')

for links in soup.find_all('a',class_='m_company_link'): 
    href = links['href']
    try:
        r = requests.get("https://www.yelu.in/" + href, headers = headers)
        soup = BeautifulSoup(r.text,'lxml')
        company = {
        "company_name" : soup.select_one('#company_name').text,
        "address" : soup.select_one('div.text.location').text,
        "phone" : soup.select_one('div.text.phone').text,
        "mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
        "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
        "website" : soup.find('div',string = "Website").find_next_sibling('div').text,
        "year" :soup.find('span',string = "Establishment year").next_sibling,
        "employees" :soup.find('span',string = "Employees").next_sibling,
        "manager" :soup.find('span',string = "Company manager").next_sibling
         }
        print(company)
    except AttributeError as e:
        print("https://www.yelu.in/" + href, r.status_code)