当代码适用于使用python beautifulsoup的公司时，我收到了文本错误_Python_Pandas_Web Scraping_Beautifulsoup

当代码适用于使用python beautifulsoup的公司时，我收到了文本错误

python pandas web-scraping

当代码适用于使用python beautifulsoup的公司时，我收到了文本错误,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,从bs4导入BeautifulSoup 导入请求 r=请求。获取（“”） soup=BeautifulSoup（r.text，'lxml'）查找汤中的链接。查找所有（'a'，class='m'u company\u link'）： href=links['href'] headers = {'User-Agent': 'Googleboat'} r = requests.get("https://www.yelu.in/"+href,headers = headers) soup = Beau

从bs4导入BeautifulSoup 导入请求

r=请求。获取（“”） soup=BeautifulSoup（r.text，'lxml'）

查找汤中的链接。查找所有（'a'，class='m'u company\u link'）： href=links['href']

headers = {'User-Agent': 'Googleboat'}
r = requests.get("https://www.yelu.in/"+href,headers = headers)
soup = BeautifulSoup(r.text,'lxml')
company = {
    "company_name" : soup.select_one('#company_name').text,
    "address" : soup.select_one('div.text.location').text,
    "phone" : soup.select_one('div.text.phone').text,
    "mobile_phone" : soup.find('div',string = "Mobile 
     phone").find_next_sibling('div').text,
    "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
    "website" : soup.find('div',string = 
    "Website").find_next_sibling('div').text,
    "year" :soup.find('span',string = "Establishment year").next_sibling,
    "employees" :soup.find('span',string = "Employees").next_sibling,
    "manager" :soup.find('span',string = "Company manager").next_sibling
}
print(company)

我收到以下错误“回溯（最近一次呼叫）：文件“C:\Python27\yelu.py”，第14行，在 “公司名称”：汤。选择一个（“#公司名称”）。文本， AttributeError:“非类型”对象没有属性“文本”

你的URL构造是错误的。摆脱双重身份

in/" + href

即：

如果您检查当前的响应，您将看到未找到页面

请参见以下内容的输出：

from bs4 import BeautifulSoup 
import requests

headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising') 
soup = BeautifulSoup(r.text,'lxml')

for links in soup.find_all('a',class_='m_company_link'): 
    href = links['href']
    try:
        r = requests.get("https://www.yelu.in/" + href, headers = headers)
        soup = BeautifulSoup(r.text,'lxml')
        company = {
        "company_name" : soup.select_one('#company_name').text,
        "address" : soup.select_one('div.text.location').text,
        "phone" : soup.select_one('div.text.phone').text,
        "mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
        "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
        "website" : soup.find('div',string = "Website").find_next_sibling('div').text,
        "year" :soup.find('span',string = "Establishment year").next_sibling,
        "employees" :soup.find('span',string = "Employees").next_sibling,
        "manager" :soup.find('span',string = "Company manager").next_sibling
         }
        print(company)
    except AttributeError as e:
        print("https://www.yelu.in/" + href, r.status_code)

样本输出：

from bs4 import BeautifulSoup 
import requests

headers = {'User-Agent': 'Googleboat'}
r = requests.get('https://www.yelu.in/category/advertising') 
soup = BeautifulSoup(r.text,'lxml')

for links in soup.find_all('a',class_='m_company_link'): 
    href = links['href']
    try:
        r = requests.get("https://www.yelu.in/" + href, headers = headers)
        soup = BeautifulSoup(r.text,'lxml')
        company = {
        "company_name" : soup.select_one('#company_name').text,
        "address" : soup.select_one('div.text.location').text,
        "phone" : soup.select_one('div.text.phone').text,
        "mobile_phone" : soup.find('div',string = "Mobile phone").find_next_sibling('div').text,
        "fax": soup.find('div',string = "Fax").find_next_sibling('div').text,
        "website" : soup.find('div',string = "Website").find_next_sibling('div').text,
        "year" :soup.find('span',string = "Establishment year").next_sibling,
        "employees" :soup.find('span',string = "Employees").next_sibling,
        "manager" :soup.find('span',string = "Company manager").next_sibling
         }
        print(company)
    except AttributeError as e:
        print("https://www.yelu.in/" + href, r.status_code)