Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何仅在html段落元素包含粗体元素时提取它们_Python_Html_Wikipedia - Fatal编程技术网

Python 如何仅在html段落元素包含粗体元素时提取它们

Python 如何仅在html段落元素包含粗体元素时提取它们,python,html,wikipedia,Python,Html,Wikipedia,我正试图从Wikipedia页面的ID='See'下提取段落元素,并将其全部提取到一个列表中 使用: import bs4 import requests response = requests.get("https://wikitravel.org/en/Bhopal") if response is not None: html = bs4.BeautifulSoup(response.text, 'html.parser') plot=[] # find the node

我正试图从Wikipedia页面的ID='See'下提取段落元素,并将其全部提取到一个列表中

使用:

import bs4
import requests


response = requests.get("https://wikitravel.org/en/Bhopal")

if response is not None:
    html = bs4.BeautifulSoup(response.text, 'html.parser')
plot=[]

# find the node with id of "Plot"
mark = html.find(id="See")

# walk through the siblings of the parent (H2) node 
# until we reach the next H2 node
for elt in mark.parent.nextSiblingGenerator():
    if elt.name == "h2":
        break
    if hasattr(elt, "text"):
        plot.append(elt.text)

现在我只想提取其中包含粗体元素的段落,我如何在这里实现这一点?

这就是您要寻找的吗? 我在代码中添加了几行。我使用过lxml解析器(html也可以)

我的jupyter笔记本上输出的前几行:

from bs4 import BeautifulSoup as bs 
import lxml
import ssl
import requests
ssl._create_default_https_context = ssl._create_unverified_context

url = 'https://wikitravel.org/en/Bhopal'
content = requests.get('https://wikitravel.org/en/Bhopal').text
soup = bs(content, 'lxml')

plot =[]
mark = soup.find(id="See")

# # # walk through the siblings of the parent (H2) node 
# # # until we reach the next H2 node
for elt in mark.parent.next_siblings:
    if elt.name == "h2":
        break
    if hasattr(elt, "text") and (elt.find('b')):
        plot.append(elt.text)
print(*plot,sep=('\n')) #Just to print the list in a readable way