Python代码有时执行，有时不执行_Python_Python 3.x_Beautifulsoup

Python代码有时执行，有时不执行

python python-3.x

Python代码有时执行，有时不执行,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在建立一个数据库（Pandas Dataframe）来存储公司列表中新闻文章（上周文章）的新闻链接。我已经编写了一个python代码，但是代码会执行一段时间，有时不会，而且不会产生任何错误。由于它没有产生任何日志或错误，我发现很难理解这个问题的背景我尝试从浏览器中删除缓存，因为我使用的是Jupyter笔记本，我还尝试了其他应用程序，如Sypder。我对Jupyter笔记本和其他应用程序也有同样的问题链接\u输出=[] 类新闻稿：定义初始（自我，术语）： self.term=术语 se

我正在建立一个数据库（Pandas Dataframe）来存储公司列表中新闻文章（上周文章）的新闻链接。我已经编写了一个python代码，但是代码会执行一段时间，有时不会，而且不会产生任何错误。由于它没有产生任何日志或错误，我发现很难理解这个问题的背景

我尝试从浏览器中删除缓存，因为我使用的是Jupyter笔记本，我还尝试了其他应用程序，如Sypder。我对Jupyter笔记本和其他应用程序也有同样的问题


链接\u输出=[]
类新闻稿：
定义初始（自我，术语）：
self.term=术语
self.url=https://www.google.com/search?q={0}&safe=active&tbs=qdr:w，sdb:1&tbm=nws&source=lnt&dpr=1'。格式（self.term）
def NewsArticlerun（自）：
response=requests.get（self.url）
soup=BeautifulSoup（response.text，'html.parser'）
链接=汤。选择（“.r a”）
numOpen=最小值（5，len（链接））
对于范围内的i（numOpen）：
响应_链接=”https://www.google.com“+链接[i]。获取（“href”）
打印（回复链接）
links\u output.append（{“Weblink”：response\u links}）
pd.DataFrame.from_dict（链接_输出）
公司名单=[“Wipro”、“Reliance”、“icici bank”、“vedanta”、“DHFL”、“yesbank”、“tata motors”、“tata steel”、“IL&FS”、“Jet airways”、“apollo轮胎”、“ashok leyland”、“Larson&Turbo”、“Mindtree”、“Infosys”、“TCS”、“AxisBank”、“Mahindra&Mahindra”]
对于我所在的公司名单：
comp_list=str（“+i+”）
呼叫代码=新闻PAPR（公司列表）
调用\u code.NewsArticlerun（）

我希望打印Web链接，作为熊猫数据框，首先，函数的命名约定是错误的，我已经更改了它

函数中没有返回任何内容<代码>返回它

def newsArticlerun(self):
    response=requests.get(self.url)
    soup=BeautifulSoup(response.text,'html.parser')
    links=soup.select(".r a")

    numOpen = min(5, len(links))
    for i in range(numOpen):
        response_links = "https://www.google.com" + links[i].get("href")
        print(response_links)
        links_output.append({"Weblink":response_links})
    return pd.DataFrame.from_dict(links_output) # this will return your df

要打印结果，请添加

print

for i in list_of_companies:
    comp_list = str('"'+ i + '"')
    call_code=Newspapr(comp_list)
    print(call_code.NewsArticlerun()) # here

注意：因此，您无法获得结果

<div style="font-size:13px;">
<b>About this page</b><br/><br/>Our systems have detected unusual traffic from your computer network.  This page checks to see if it's really you sending the requests, and not a robot.  <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br/><br/>
<div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;">
This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop.  In the meantime, solving the above CAPTCHA will let you continue to use our services.<br/><br/>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests.  If you share your network connection, ask your administrator for help — a different computer using the same IP address may be responsible.  <a href="//support.google.com/websearch/answer/86640">Learn more</a><br/><br/>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
</div>


关于此页面

我们的系统检测到来自您计算机网络的异常流量。这个页面检查是否真的是你发送请求，而不是机器人


当Google自动检测到来自您的计算机网络的请求时，会显示此页面，这些请求似乎违反了。这些请求停止后，该块将很快过期。同时，解决上述验证码将允许您继续使用我们的服务。

此流量可能是由恶意软件、浏览器插件或发送自动请求的脚本发送的。如果共享网络连接，请向管理员寻求帮助-可能是使用相同IP地址的其他计算机造成的

如果您使用的是已知机器人使用的高级术语，或者发送请求非常快，有时可能会要求您解决验证码问题。

我想你触发了谷歌搜索反垃圾邮件对策。为您的请求添加延迟可能会有所帮助

编辑：如前所述，使用官方谷歌API

Edit2:请看这篇文章以获得深入的答案：

Edit3:为了让它更有用，你应该在标题前面加上“谷歌搜索”一词来澄清你的问题的性质

你在哪里存储

数据框

？熊猫数据框没有得到更新。首先，我看不到你的代码中存储了任何数据框。将其存储在变量中，并以

csv，pickle

的格式保存在计算机上。在那之后，用新数据附加数据框。谢谢你的建议。即使我删除这行“pd.DataFrame.from_dict（links_output）”，这段代码有时也不会产生任何输出。它不是打印网页链接，我以前也用过它，即使我用过你的代码，它也不会产生输出。难以理解的issue@user11546528是的，您看不到，因为您的

numOpen

始终为零。您的for循环未执行。链接输出为空。是的，这就是问题所在，有时我得到输出，有时没有，今天下午我得到输出，现在我什么都没有得到。似乎很weird@user11546528，问题是由于验证码。请阅读