Web scraping 使用beautiful soup时无法获取特定标签_Web Scraping_Beautifulsoup_Html Parsing

Web scraping 使用beautiful soup时无法获取特定标签

web-scraping

Web scraping 使用beautiful soup时无法获取特定标签,web-scraping,beautifulsoup,html-parsing,Web Scraping,Beautifulsoup,Html Parsing,我想从stack overflow网站上提取信息，当我想得到问题的文本部分时，我做了： import requests from bs4 import BeautifulSoup response=requests.get("https://stackoverflow.com/") soup=BeautifulSoup(response.text,"html.parser",multi_valued_attributes=None) for tag i

我想从stack overflow网站上提取信息，当我想得到问题的文本部分时，我做了：

import requests
from bs4 import BeautifulSoup
response=requests.get("https://stackoverflow.com/")
soup=BeautifulSoup(response.text,"html.parser",multi_valued_attributes=None)

for tag in soup.find_all('a',class_='question-hyperlink'):
    print(tag)

这根本没有输出。我认为在过滤类时会出现一些问题，但我不确定它是什么

这个很好用：

import requests  
from bs4 import BeautifulSoup
response=requests.get("https://stackoverflow.com/questions")
soup=BeautifulSoup(response.text,"html.parser")
question=soup.select(".question-summary")

for a in question:
    print(a.select_one(".question-hyperlink").getText())

import requests
from bs4 import BeautifulSoup

response = requests.get("https://stackoverflow.com/questions")
soup = BeautifulSoup(response.text, "html.parser")

for tag in soup.find_all('a', class_='question-hyperlink'):
    print(tag.getText(strip=True))

但是前一个有什么问题吗？

您在第一个代码片段的这一行的url中缺少了

问题

：

response=requests.get（“https://stackoverflow.com/“”

这很好：

import requests  
from bs4 import BeautifulSoup
response=requests.get("https://stackoverflow.com/questions")
soup=BeautifulSoup(response.text,"html.parser")
question=soup.select(".question-summary")

for a in question:
    print(a.select_one(".question-hyperlink").getText())

import requests
from bs4 import BeautifulSoup

response = requests.get("https://stackoverflow.com/questions")
soup = BeautifulSoup(response.text, "html.parser")

for tag in soup.find_all('a', class_='question-hyperlink'):
    print(tag.getText(strip=True))

输出：

Pass a json object in function as a variable
iPhone Application Development in Windows 10 Platform
Jetty Websocket API Session
Exit from a multiprocessing Pool for loop using apply_async and terminate
bootstrap 5 grid layout col-md-6 not working correctly
R comparison (1) is possible only for atomic and list types
NeutralinoJS: error: missing required argument 'name'
Formatting text editor with Elementor

and so on ...

否则，就不存在这样的锚定标记类。

有效。但我还是不明白..当我们转到“”并检查元素时，它会显示html脚本，对吗？那又有什么区别呢？因为你需要登录才能访问该页面的内容。请尝试注销并导航到该页面，以查看该页面显示的内容@optimistic zia.@SIM我现在知道了。