Python 抓取时获取UnboundLocalError_Python_Web Scraping_Beautifulsoup

Python 抓取时获取UnboundLocalError

python web-scraping

Python 抓取时获取UnboundLocalError,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我在抓取时出错： UnboundLocalError:赋值前引用了局部变量“tag” 这似乎是由 -->17返回tag.select_one.b-plainlist_date.text，tag.select_one.b-plainlist_title.text， tag.find_nextclass_u=b-plainlist_uuuannounce.text.strip 我使用的代码如下所示： import requests from bs4 import BeautifulSoup from

我在抓取时出错：

UnboundLocalError:赋值前引用了局部变量“tag”

这似乎是由

-->17返回tag.select_one.b-plainlist_date.text，tag.select_one.b-plainlist_title.text， tag.find_nextclass_u=b-plainlist_uuuannounce.text.strip

我使用的代码如下所示：

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

daterange = pd.date_range('02-25-2015', '09-16-2020', freq='D')

def main(req, date):
    r = req.get(f"website/{date.strftime('%Y%m%d')}")
    soup = BeautifulSoup(r.content, 'html.parser')
    for tag in soup.select(".b-plainlist "):
        print(tag.select_one(".b-plainlist__date").text)
        print(tag.select_one(".b-plainlist__title").text)
        print(tag.find_next(class_="b-plainlist__announce").text.strip())
    
    return tag.select_one(".b-plainlist__date").text, tag.select_one(".b-plainlist__title").text, tag.find_next(class_="b-plainlist__announce").text.strip()


with ThreadPoolExecutor(max_workers=30) as executor:
    with requests.Session() as req:
        fs = [executor.submit(main, req, date) for date in daterange]
        allin = []
        for f in fs:
            allin.append(f.result()) # the problem should be from here
        df = pd.DataFrame.from_records(
            allin, columns=["Date", "Title", "Content"])

我试图在这篇文章中应用一些更改，但我想我还没有完全理解如何修复它

更新：这是网站的回复和print soup的内容。选择B-plainlist

b'\n\n\n \n HTTP 503\n\n

\n html{字体系列：Helvetica Neue，Helvetica， Arial，无衬线；}\n正文 {背景色：fff；填充：15px；}\n div.title {字体大小：32px；字体大小：粗体；行高：1.2em；}\n div.sub-title{font-size:25px；}\n div.descr {页边距顶端：40px；}\n div.footer{页边距顶端：80px；颜色：777；}\n div.guru{字体大小：12px；颜色：ccc；}\n\n\n \n 503错误\n 服务不可用\n\n\n请尝试访问该网站请在几分钟内发送到it.sputniknews.com。

\n如果重复几次，请联系站点管理人员。

\n \n\n\n\n IP:107.181.177.10\n 请求：获取L3BvbGl0aWNhLzIwMTUwMzA4\n Guru 冥想：MGV1SjNTaWhuUHNiblJYVU96QVpxMDB6N1hDNjU5NTU=\n \n\n\n\n\n\n\n'

尝试在for循环之外声明tag=None，如下所示

def main(req, date):
r = req.get(f"website/{date.strftime('%Y%m%d')}")
soup = BeautifulSoup(r.content, 'html.parser')
tag=None
for tag in soup.select(".b-plainlist "):

当控件从未进入循环，变量“tag”也从未初始化时，就会发生错误。因此，当您尝试返回tag.select_one.b-plainlist_udate时，编译器会抛出一个UnboundLocalError，尝试在for循环外声明tag=None，如下所示

def main(req, date):
r = req.get(f"website/{date.strftime('%Y%m%d')}")
soup = BeautifulSoup(r.content, 'html.parser')
tag=None
for tag in soup.select(".b-plainlist "):

当控件从未进入循环，变量“tag”也从未初始化时，就会发生错误。因此，当您尝试返回tag.select_one.b-plainlist_udate时，编译器会抛出一个UnboundLocalError

我认为这个错误是由这个None引起的。当我运行for循环并将结果附加到Allinar中时，有些东西不起作用。您可以打印网站的响应吗？请在汤的正上方做一个打印，r.content=beautifulsou。。。。我想确保您确实收到了回复。在您进行回复的同时，是否也可以打印汤。选择B-plainlist？。我很快浏览了您提供的HTML，没有看到上面提到的类的任何标记。我不确定我使用的类是否正确，因此可能是它们错了，尽管我可以看到输出。答案是：b’。关于其他打印，请参见问题。感谢您的帮助响应代码503表示服务不可用。因此，这不是一个BeautifulSoup错误。您使用“请求”模块发出的请求有问题。理想情况下，当您发出默认get请求时，应该得到200代码。我建议使用POSTMAN调用URL并调试您的请求的错误。我认为这个错误是由这个错误引起的。当我运行for循环并将结果附加到Allinar中时，有些东西不起作用。您可以打印网站的响应吗？请在汤的正上方做一个打印，r.content=beautifulsou。。。。我想确保您确实收到了回复。在您进行回复的同时，是否也可以打印汤。选择B-plainlist？。我很快浏览了您提供的HTML，没有看到上面提到的类的任何标记。我不确定我使用的类是否正确，因此可能是它们错了，尽管我可以看到输出。答案是：b’。关于其他打印，请参见问题。感谢您的帮助响应代码503表示服务不可用。因此，这不是一个BeautifulSoup错误。您使用“请求”模块发出的请求有问题。理想情况下，当您发出默认get请求时，应该得到200代码。我建议使用POSTMAN调用URL并调试您的请求中的错误