Python 用漂亮的汤选div
你好,我有这样一个html,当我用BeautifulSoup解析它时,我无法选择类文本。认为问题在于嵌套的标记没有被识别为它的子项。 如何选择跨度标记文本 谢谢Python 用漂亮的汤选div,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,你好,我有这样一个html,当我用BeautifulSoup解析它时,我无法选择类文本。认为问题在于嵌套的标记没有被识别为它的子项。 如何选择跨度标记文本 谢谢 <div data-component="new_enquiry_form_app" data-props="{"isTelRequired":false,"placement":"top",}"> <section
<div data-component="new_enquiry_form_app" data-props="{"isTelRequired":false,"placement":"top",}">
<section class="enquiry-form-box__wrapper">
<div class="enquiry-form-box enquiry-form-box--inverted">
<form class="enquiry-form-box__form" tabindex="-1">
<fieldset class="enquiry-form-box__wrapper">
<div class="enquiry-form-box__fields">
<div class="k-ns">
<span class="text-gray block mt-3 font-bold text-sm">Property reference: 412</span>
</div>
</div>
</fieldset>
</form>
</div>
</section>
物业编号:412
试试这个:
from bs4 import BeautifulSoup
html = '''<div data-component="new_enquiry_form_app" data-props="{"isTelRequired":false,"placement":"top",}">
<section class="enquiry-form-box__wrapper">
<div class="enquiry-form-box enquiry-form-box--inverted">
<form class="enquiry-form-box__form" tabindex="-1">
<fieldset class="enquiry-form-box__wrapper">
<div class="enquiry-form-box__fields">
<div class="k-ns">
<span class="text-gray block mt-3 font-bold text-sm">Property reference: 412</span>
</div>
</div>
</fieldset>
</form>
</div>
</section>'''
soup = BeautifulSoup(html, 'html.parser')
span = soup.select_one('span.text-gray.block.mt-3.font-bold.text-sm')
print(span.get_text())
那么这是一种方式:
from selenium import webdriver
driver = webdriver.Firefox(executable_path='c:program/geckodriver')
driver.get('https://www.kyero.com/en/property/7689206-villa-for-sale-sant-joan-de-labritja')
span = driver.find_element_by_css_selector('span.text-gray.block.mt-3.font-bold.text-sm')
print(span.text)
driver.close()
印刷品:
Property reference: 412
Property reference: 412
请注意,在本代码中,geckodriver被设置为从c:/program/geckodriver.exe导入
@安德烈·凯斯利回答另一个问题的速度更快,所以我给出了一个硒元素的答案。试试这个:
from bs4 import BeautifulSoup
html = '''<div data-component="new_enquiry_form_app" data-props="{"isTelRequired":false,"placement":"top",}">
<section class="enquiry-form-box__wrapper">
<div class="enquiry-form-box enquiry-form-box--inverted">
<form class="enquiry-form-box__form" tabindex="-1">
<fieldset class="enquiry-form-box__wrapper">
<div class="enquiry-form-box__fields">
<div class="k-ns">
<span class="text-gray block mt-3 font-bold text-sm">Property reference: 412</span>
</div>
</div>
</fieldset>
</form>
</div>
</section>'''
soup = BeautifulSoup(html, 'html.parser')
span = soup.select_one('span.text-gray.block.mt-3.font-bold.text-sm')
print(span.get_text())
那么这是一种方式:
from selenium import webdriver
driver = webdriver.Firefox(executable_path='c:program/geckodriver')
driver.get('https://www.kyero.com/en/property/7689206-villa-for-sale-sant-joan-de-labritja')
span = driver.find_element_by_css_selector('span.text-gray.block.mt-3.font-bold.text-sm')
print(span.text)
driver.close()
印刷品:
Property reference: 412
Property reference: 412
请注意,在本代码中,geckodriver被设置为从c:/program/geckodriver.exe导入
@Andrej Kesely用另一个答案更快,因此我给出了selenium答案。要打印参考标签,可以使用此脚本(数据存储在HTML文档中的javascript变量中):
印刷品:
Property reference: 412
要打印引用标签,可以使用此脚本(数据存储在HTML文档中的javascript变量中):
印刷品:
Property reference: 412
你想要什么文本<代码>属性参考:412
?@MendelG yes属性参考:412您想要什么文本<代码>属性引用:412
?@MendelG yes属性引用:412对于答案,此代码单独打印“属性引用:412”,但不幸的是,当我尝试在整个网页上打印时,却没有打印。然后给我url@ArthurUpdated my answer@ArthurThanks,但我仍然很好奇,为什么没有办法只选择跨度,或者在这种情况下使用div,就像在其他情况下一样?数据加载了JavaScript,请求没有运行JavaScriptsHanks来获得答案,这段代码单独打印“Property reference:412”,但不幸的是,当我在整个网页上尝试时,却没有打印任何内容。然后给我url@ArthurUpdated my answer@ArthurHanks很多,但我仍然很好奇,为什么在这种情况下无法像在其他情况下一样选择span或div?数据是用JavaScript加载的,请求不会运行JavaScriptsHanks来获得答案!谢谢你的回答!