用Python Beautiful Soup提取网页抓取中的价值_Python_Html_Css_Web Scraping_Beautifulsoup

用Python Beautiful Soup提取网页抓取中的价值

python html css web-scraping

用Python Beautiful Soup提取网页抓取中的价值,python,html,css,web-scraping,beautifulsoup,Python,Html,Css,Web Scraping,Beautifulsoup,如何从下面的HTML代码中提取值“1.00 TK=779.8 我尝试了下面的代码，但它不工作 from bs4 import BeautifulSoup page = requests.get(<url>).text ##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope&

如何从下面的HTML代码中提取值“1.00 TK=779.8

我尝试了下面的代码，但它不工作

from bs4 import BeautifulSoup
page = requests.get(<url>).text

##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

使用

find_next（）

，返回第一个匹配项：

from bs4 import BeautifulSoup

html = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

编辑：使用：

输出：

1.00 TK = 779.8

1.00 USD = 73.9375 Indian Rupee (INR)

希望它能有所帮助

from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)

我尝试使用id=“driveValue”提取该值，结果是不感谢Samsul。。是否有任何方法可以使用“ID”提取？为什么在OP具有BeautifulSoup标记时使用XML解析？是否可以使用BeautifulSoup？是的，可以使用id进行提取。我使用XML解析，因为它是一个易于使用的库，用于在Python中处理XML和HTML。当我只使用字符串时，它可以正常工作，但在这里我需要阅读页面的内容，获取错误AttributeError:'NoneType'对象没有属性'find_next'@itgeek该页面可能是动态加载的。请参阅使用

selenium

来刮取动态页面。我也尝试过这个方法，打印（soup.find（'span'，id=“driveValue”））并打印“None”，当我想从字符串“html=''1.00 TK=779.8磁盘驱动器值（DDV）'”中提取值时，您的解决方案会有所帮助。谢谢我知道使用selenium的可能性。我想知道我是否可以不用硒。

1.00 USD = 73.9375 Indian Rupee (INR)

from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)

1.00 TK = 779.8