Python 3.x 如何使用any<；打印或提取div类中的文本；p>&书信电报；span>；或者使用BeautifulSoup和Python3.x？_Python 3.x_Web Scraping_Window

Python 3.x 如何使用any<；打印或提取div类中的文本；p>&书信电报；span>；或者使用BeautifulSoup和Python3.x？

python-3.x web-scraping

Python 3.x 如何使用any<；打印或提取div类中的文本；p>&书信电报；span>；或者使用BeautifulSoup和Python3.x？,python-3.x,web-scraping,window,Python 3.x,Web Scraping,Window,假设我在div类中有这样一个文本Name。我试过了，但没成功。我需要提取span-9列中的名称，后跟文本。这是我的密码 import requests from bs4 import BeautifulSoup url = "https://v2.sherpa.ac.uk/id/publisher/1939?template=romeo" r = requests.get(url) htmlContent = r.content soup = BeautifulSo

假设我在div类中有这样一个文本

Name

。我试过了，但没成功。我需要提取span-9列中的名称，后跟文本。这是我的密码

import requests
from bs4 import BeautifulSoup

url = "https://v2.sherpa.ac.uk/id/publisher/1939?template=romeo"

r = requests.get(url)
htmlContent = r.content

soup = BeautifulSoup(htmlContent, 'html.parser')
title = soup.title
print(title)

div_text = soup.find("div", {"class": "col span-3"}).get_text()
div_text = soup.find("div", {"class": "col span-9"}).get_text()
print(div_text)

当我使用

div_text=soup.find（“div”，{“class”：“col span-3”}）

print（div_text）

时，我会给出所有标记的结果。但是当我使用.get_text（）时，它只给出第一个标记名。当我使用第span-3列和第span-9列来获取文本时，它给出了类为span-9的文本

它只给出一个结果，即“1066 TIDSKRIFT for historie[英语]”，而不是标题，该结果来自第span-9列。我需要这样的“名称：1066 Tidskrift for historie[英语]；URL:国家：丹麦；出版物数量：1”

当您第二次分配给div_文本时，您正在覆盖它。试着这样做：

div_text_header = soup.find("div", {"class": "col span-3"}).get_text()
div_text_value = soup.find("div", {"class": "col span-9"}).get_text()
print(div_text_header)
print(div_text_value)

对于您需要的实际数据，您可以执行以下操作：

print(f'{div_text_header}: {div_text_value}')

看起来你正试图从所有数据中获取这些信息。这应该起作用：

div_headers = soup.find_all("div", {"class": "col span-3"})
div_values = soup.find_all("div", {"class": "col span-9"})
for header, value in zip(div_headers, div_values):
  print(f'{header.get_text()}: {value.get_text()}')

第二次指定div_文本时，您正在覆盖该文本。试着这样做：

div_text_header = soup.find("div", {"class": "col span-3"}).get_text()
div_text_value = soup.find("div", {"class": "col span-9"}).get_text()
print(div_text_header)
print(div_text_value)

对于您需要的实际数据，您可以执行以下操作：

print(f'{div_text_header}: {div_text_value}')

看起来你正试图从所有数据中获取这些信息。这应该起作用：

div_headers = soup.find_all("div", {"class": "col span-3"})
div_values = soup.find_all("div", {"class": "col span-9"})
for header, value in zip(div_headers, div_values):
  print(f'{header.get_text()}: {value.get_text()}')

名称：1066 Tidskrift for historie[英文]URL:国家：丹麦出版物数量：1[查看]出版物：1066 Tidskrift for historie ID:1939创建日期：2014年5月29日09:09:37 UTC上次修改时间：2019年3月13日11:14:41 UTC URI:它给出上述结果。出版物数量中有一个链接：1[view]，它链接到一个网页，在我打开该网页后，还有一个网页。那么，如何从一页到最后一页获取数据呢。那么，如何将这些数据链接到上面的脚本中。有没有办法在一个脚本中创建多个汤。或者有其他方法可以在一个csv文件中获得这些结果，很高兴它对您有效！你介意投票表决我的答案然后接受吗？我正试图获得足够的声誉来发表评论。非常感谢。至于你的第二个问题，当有链接时，你可以在for循环中添加一行。类似这样：

if value.find（'a'）：new_url=value.find（'a'）.attrs['href']

使用该新url，您可以创建一个新的汤：

newSoup=BeautifulSoup（requests.get（new_url.content，'html.parser'）

Name:1066 tidskrift for historie[English]url:Country:丹麦出版物数量：1[查看]出版物：1066 Tidskrift for Historie ID:1939创建日期：2014年5月29日09:09:37 UTC上次修改日期：2019年3月13日11:14:41 UTC URI:它给出了上述结果。出版物数量中有一个链接：1[view]，它链接到一个网页，在我打开该网页后，还有一个网页。那么，如何从一页到最后一页获取数据呢。那么，如何将这些数据链接到上面的脚本中。有没有办法在一个脚本中创建多个汤。或者有其他方法可以在一个csv文件中获得这些结果，很高兴它对您有效！你介意投票表决我的答案然后接受吗？我正试图获得足够的声誉来发表评论。非常感谢。至于你的第二个问题，当有链接时，你可以在for循环中添加一行。类似这样：

if value.find（'a'）：new\u url=value.find（'a'）.attrs['href']

使用该新url，您可以创建一个新的汤：

newSoup=BeautifulSoup（requests.get（new\u url.content，'html.parser'）