Python-使用BeautifulSoup 4在特定注释节点之间提取数据_Python_Web Scraping_Bs4

Python-使用BeautifulSoup 4在特定注释节点之间提取数据

python web-scraping

Python-使用BeautifulSoup 4在特定注释节点之间提取数据,python,web-scraping,bs4,Python,Web Scraping,Bs4,希望从网站中挑选出具体的数据，如价格、公司信息等。幸运的是，网站设计师已经放置了很多标签，如  ' desired data  以下是有问题的网站：如果要在这些特定注释之后选择表元素，则可以选择所有注释节点，根据所需文本筛选它们，然后选择下一个同级表元素： import requests from bs4 import BeautifulSoup from

希望从网站中挑选出具体的数据，如价格、公司信息等。幸运的是，网站设计师已经放置了很多标签，如

<!-- Begin Services Table -->
' desired data
<!-- End Services Table -->

以下是有问题的网站：

如果要在这些特定注释之后选择

表

元素，则可以选择所有注释节点，根据所需文本筛选它们，然后选择下一个同级

表

元素：

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

comments = soup.find_all(string=lambda text:isinstance(text,Comment))

for comment in comments:
    if comment.strip() == 'Begin Services Table':
        table = comment.find_next_sibling('table')
        print(table)

或者，如果要获取这两条注释之间的所有数据，则可以找到第一条注释，然后迭代所有下一条注释，直到找到结束注释：

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

data = []

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)):
    if comment.strip() == 'Begin Services Table':
        next_node = comment.next_sibling

        while next_node and next_node.next_sibling:
            data.append(next_node)
            next_node = next_node.next_sibling

            if not next_node.name and next_node.strip() == 'End Services Table': break;

print(data)

如果要在这些特定注释之后选择

表格

元素，则可以选择所有注释节点，根据所需文本对其进行过滤，然后选择下一个同级

表格

元素：

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

comments = soup.find_all(string=lambda text:isinstance(text,Comment))

for comment in comments:
    if comment.strip() == 'Begin Services Table':
        table = comment.find_next_sibling('table')
        print(table)

或者，如果要获取这两条注释之间的所有数据，则可以找到第一条注释，然后迭代所有下一条注释，直到找到结束注释：

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

data = []

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)):
    if comment.strip() == 'Begin Services Table':
        next_node = comment.next_sibling

        while next_node and next_node.next_sibling:
            data.append(next_node)
            next_node = next_node.next_sibling

            if not next_node.name and next_node.strip() == 'End Services Table': break;

print(data)

所以这不是免费的编码服务。你必须自己尝试解决这个问题。如果你不能让它工作，发布你尝试过的，我们会帮你修复。很抱歉@Barmar我忘记发布我的原始代码了！所以这不是免费的编码服务。你必须自己尝试解决这个问题。如果你不能让它工作，发布你尝试过的，我们会帮你修复。很抱歉@Barmar我忘记发布我的原始代码了！