Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2008/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python componentDidMount后React web应用程序的web垃圾处理_Python_Reactjs_Web Scraping_Beautifulsoup - Fatal编程技术网

Python componentDidMount后React web应用程序的web垃圾处理

Python componentDidMount后React web应用程序的web垃圾处理,python,reactjs,web-scraping,beautifulsoup,Python,Reactjs,Web Scraping,Beautifulsoup,我目前使用Python3作为学习网络垃圾的一种方式,遇到了一个奇怪的问题。在上下文中,我试图从各种新闻标题中提取一些数据并检索它们。我正在使用请求和美化组库 我的回答没有任何实质性内容。发送更简单的请求时: import requests from bs4 import BeautifulSoup response = requests.get('https://www.cnn.com/') soup = BeautifulSoup(response.text, 'html.parser')

我目前使用Python3作为学习网络垃圾的一种方式,遇到了一个奇怪的问题。在上下文中,我试图从各种新闻标题中提取一些数据并检索它们。我正在使用
请求
美化组

我的回答没有任何实质性内容。发送更简单的请求时:

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.cnn.com/')
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
我遇到了一堆看起来像CSS和一些js的东西。只有在底部我才能看到一个div,我假设React正在渲染它。问题是我不能用这种方式获取数据。我认为CNN使用某种
useffect
componentDidMount
来填充这些数据,这意味着它最初不会出现在初始DOM中。这当然不是人类用户所关心的问题,但在这里会引起一些问题


我能做些什么来避免这个问题?

在Chrome开发者控制台上,如果你检查网络选项卡,你会看到一堆以
/zone manager.izl
结尾的请求:

内容是JSON,带有一个
html
字段,其中包含一些html内容(包括我们正在寻找的healines)

内容分为4个区域,2种url格式。以下是获取所有这些信息的示例代码:

import requests

pageType1 = "_intl-homepage-zone-injection/index.html:intl_homepage-injection-zone"
pageType2 = "index.html:intl_homepage1-zone"

for i in range(1,5):
    r = requests.get(f"https://edition.cnn.com/data/ocs/section/{pageType1}-{i}/views/zones/common/zone-manager.izl")
    print(r.json()["html"])
    r = requests.get(f"https://edition.cnn.com/data/ocs/section/{pageType2}-{i}/views/zones/common/zone-manager.izl")
    print(r.json()["html"])
似乎给出标题的URL是:

然后,您可以开始使用或任何html解析器来提取数据

例如,获取
h2
h3
标签(又名标题):

输出:

Military leaders take a stand as Trump stays silent
['The US military -- which Trump often uses to bolster himself as a commander in chief -- is moving on from the President on racial inequality', 'Derek Chauvin eligible for $1M pension', 'Live Protests continue to grow across the US', "analysis Floyd protests have a plot twist I didn't see coming", "Fox News anchor calls out Trump for saying he's done more for African Americans than any president", 'What if the next Donald Trump is, well, Donald Trump?', "Cuomo: Proof of systemic racism is in Trump's Cabinet", "Videos raise question about in-custody death deemed an 'accident' by officials", 'Woman caught on video harassing Asian American exercising in park', 'The Tyrion Lannister lookalike dreaming of Bollywood stardom', 'New book about Melania Trump says she renegotiated her prenuptial agreement', 'Young Americans are having less sex', "Kareem Abdul-Jabbar's son arrested for allegedly stabbing neighbor", 'Outrage over single mother who died after waiting days for bus home during lockdown', 'Face masks are best way to reduce coronavirus transmission, study finds', 'Stunning images show how virus is overrunning hospitals', 'Achaeologist jailed for faking finds', 'Poland invaded Czech Republic last month, says it was just a misunderstanding']

很酷。我没有想过为了找到相关数据而查看网络请求。我一定会记住这一点,以备将来的网络垃圾活动:)
Military leaders take a stand as Trump stays silent
['The US military -- which Trump often uses to bolster himself as a commander in chief -- is moving on from the President on racial inequality', 'Derek Chauvin eligible for $1M pension', 'Live Protests continue to grow across the US', "analysis Floyd protests have a plot twist I didn't see coming", "Fox News anchor calls out Trump for saying he's done more for African Americans than any president", 'What if the next Donald Trump is, well, Donald Trump?', "Cuomo: Proof of systemic racism is in Trump's Cabinet", "Videos raise question about in-custody death deemed an 'accident' by officials", 'Woman caught on video harassing Asian American exercising in park', 'The Tyrion Lannister lookalike dreaming of Bollywood stardom', 'New book about Melania Trump says she renegotiated her prenuptial agreement', 'Young Americans are having less sex', "Kareem Abdul-Jabbar's son arrested for allegedly stabbing neighbor", 'Outrage over single mother who died after waiting days for bus home during lockdown', 'Face masks are best way to reduce coronavirus transmission, study finds', 'Stunning images show how virus is overrunning hospitals', 'Achaeologist jailed for faking finds', 'Poland invaded Czech Republic last month, says it was just a misunderstanding']