Python 美如群芳，一无所获_Python_Beautifulsoup

Python 美如群芳，一无所获

python

Python 美如群芳，一无所获,python,beautifulsoup,Python,Beautifulsoup,我正试图用Beautifulsoup刮airbnb。当应该存在文本时，name将返回None。我做错什么了吗？我才刚刚开始学习，所以我很有可能错过了一些东西很简单 from bs4 import BeautifulSoup import requests from requests_html import HTMLSession import lxml def extract_features(listing_html): features_dict = {} name =

我正试图用Beautifulsoup刮airbnb。当应该存在文本时，name将返回None。我做错什么了吗？我才刚刚开始学习，所以我很有可能错过了一些东西很简单

from bs4 import BeautifulSoup
import requests
from requests_html import HTMLSession
import lxml


def extract_features(listing_html):
    features_dict = {}
    name = listing_html.find('div', {'class': '_xcsyj0'})
    features_dict['name'] = name
    return features_dict
      
def getdata(url):
    s = requests.get(url)
    soup = BeautifulSoup(s.content, 'html.parser')
    return soup

def getnextpage(soup):
    page = 'https://www.airbnb.com' + str(soup.find('a').get('href'))
    return page

url = 'https://www.airbnb.com/s/Bear-Creek-Lake-Dr--Jim-Thorpe--PA--USA/homes?tab_id=home_tab'
soup = getdata(url)
listings = soup.find_all('div','_8s3ctt')# (type of tag, name)

for listing in listings[0:2]:    
    full_url = getnextpage(listing)
    ind_soup = getdata(full_url)
    features = extract_features(ind_soup)
    print(features)
    print("-----------")

您正在尝试查找类为

\u xcsyj0

的

div

标记。但是，当我查看您在评论中发布的URL时，没有包含该类的标记。这通常发生在自动生成的CSS类中，使得像这样的web抓取任务很难完成

看起来您想要的

div

标记也没有设置

id

，因此您最好选择Airbnb的API。通过粗略的搜索，我发现了一个官方API（似乎只有经过验证的合作伙伴才能使用）和一些非官方API。不幸的是，我没有使用过任何Airbnb API，因此我无法建议哪一种API是好的/有效的。

在这里，我运行代码并得到结果。请详细说明有助于更好地调试结果的问题

返回的url是什么

None

？是

soup=getdata（url）

行还是循环中的行？name返回None。您知道是哪个url导致了问题吗？是您硬编码的还是从该页面中提取的？是从该页面中提取的。我尝试了更高级别的类，它们返回一个值。其中一个列表位于此URL。(). 我正在尝试获取“Barbarann主持的整个房子”文本。首先，您可以使用

print（）

查看哪个页面存在问题。接下来，您可以使用

print（）

检查您在

s.content

中获得的内容-也许您会收到机器人程序/垃圾邮件程序/黑客的警告/验证码。您应该在web浏览器中检查此页面是否可以在没有JavaScript的情况下工作，因为

请求

美化组

无法运行JavaScript。请尝试

列表_html.find（'div'，{'class'：'u xcsyj0'）。文本

可能会显示结果这就是我返回的结果。{'name'：None}-------{'name'：None}-----------

name = listing_html.find('div', {'class': '_xcsyj0'})