属性错误：'；非类型'；对象没有属性'；获取文本'；python 3x_Python_Python 3.x_Web Scraping_Beautifulsoup

属性错误：'；非类型'；对象没有属性'；获取文本'；python 3x

python python-3.x web-scraping

属性错误：'；非类型'；对象没有属性'；获取文本'；python 3x,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我一直在使用以下代码： def MainPageSpider(max_pages): page = 1 while page <= max_pages: url = 'url' + str(page) source_code = requests.get(url) plain_text = source_code.text soup = bs(plain_text, 'html.parser')

我一直在使用以下代码：

def MainPageSpider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'url' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = bs(plain_text, 'html.parser')
        for link in soup.findAll(attrs={'class':'col4'}):
            href = 'url' + link.a['href']
            title = link.span.text

            PostPageItems(href)
        page += 1


def PostPageItems(post_url):
    source_code = requests.get(post_url)
    plain_text = source_code.text
    soup = bs(plain_text, 'html.parser')
    for items in soup.findAll(attrs={'class':'container'}):
        title2 = items.find('h1', {'class':'title'}).get_text()

        print(title2)




MainPageSpider(1)

def MainPageSpider（最大页数）：
页码=1
虽然页面并不是每个容器
都有一个h1
，所以只需检查是否返回了None
，然后仅在未返回时打印即可
for items in soup.findAll(attrs={'class':'container'}):
        title2 = items.find('h1', {'class':'title'})
        if title2:
            print(title2.text)

从没有get_text（）
的输出来看，标题2通常是None
，这应该会失败，因为None
没有get_text（）
属性。您可以将其拆分为2条语句，并添加如下检查：
title2_item=items.find（'h1'，{'class'：'title'}）
如果标题2_项：#检查无
title2=title2_项。获取_文本（）
印刷品（标题2）
使用仅选择符合条件的元素的css选择器重新写入
for item in soup.select('.container h1.title'):
        title2 = item.text

如果项目没有h1
条目，则find
将返回None
，这就是为什么您在输出中看到每个h1
之间的所有None。你需要处理那个案子。
for items in soup.findAll(attrs={'class':'container'}):
        title2 = items.find('h1', {'class':'title'})
        if title2:
            print(title2.text)

for item in soup.select('.container h1.title'):
        title2 = item.text