Python BeautifulSoup异常中循环刮取HTML文件_Python_Beautifulsoup_Scrape

Python BeautifulSoup异常中循环刮取HTML文件

python

Python BeautifulSoup异常中循环刮取HTML文件,python,beautifulsoup,scrape,Python,Beautifulsoup,Scrape,我试图用一个本地文件夹中的HTML文件替换几个变量，但在循环的中途我遇到了一个异常。例外情况是AttributeError:'NoneType'对象没有属性'内容。它实际上不是.contents我已经查看了它挂起的文件，它的结构与其他文件完全相同。如果删除.contents，则只需使用find（）函数引发相同的异常。有人知道为什么会这样吗？同样，许多文件的处理过程没有问题。我的代码如下： df_list = [] folder = 'rt_html' for movie_html in os.l

我试图用一个本地文件夹中的HTML文件替换几个变量，但在循环的中途我遇到了一个异常。例外情况是

AttributeError:'NoneType'对象没有属性'内容

。它实际上不是

.contents

我已经查看了它挂起的文件，它的结构与其他文件完全相同。如果删除

.contents

，则只需使用

find（）

函数引发相同的异常。有人知道为什么会这样吗？同样，许多文件的处理过程没有问题。我的代码如下：

df_list = []
folder = 'rt_html'
for movie_html in os.listdir(folder):
    with open(os.path.join(folder, movie_html)) as file:
        soup = BeautifulSoup(file)
        title = soup.find('title').contents[0][:-len(' - Rotten Tomatoes')]
        audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
        num_audience_ratings = soup.find('div', class_ = 'audience-info hidden-xs superPageFontColor')
        num_audience_ratings = num_audience_ratings.find_all('div') [1].contents[2].strip().replace(',', '')


        # print(num_audience_ratings)
        # break

        df_list.append({'title': title,
                        'audience_score': int(audience_score),
                        'number_of_audience_ratings': int(num_audience_ratings)})
df = pd.DataFrame(df_list, columns = ['title', 'audience_score', 'number_of_audience_ratings'])

我的猜测是，有些文件没有您要查找的属性

例如

如果类

观众评分表中没有div
，则soup.find（'div'，class=“观众评分表”）
将返回None
。任何后续的查找
或内容
都将导致属性错误

解决方案是尝试除此之外的其他方法，并将值设置为空字符串
try:    
    audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
except AttributeError:
    audience_score=""  

对标题
和num\u收视率
（两个作业）执行相同的操作Goo故障排除建议。仍然不确定为什么会发生这种情况，我发现标记与循环中的其他文件相同。不过，这将有助于暂时克服它，谢谢。
try:    
    audience_score = soup.find('div', class_ = 'audience-score meter').find('span').contents[0][:-1]
except AttributeError:
    audience_score=""