Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Python Beautiful Soup提取网页抓取中的价值_Python_Html_Css_Web Scraping_Beautifulsoup - Fatal编程技术网

用Python Beautiful Soup提取网页抓取中的价值

用Python Beautiful Soup提取网页抓取中的价值,python,html,css,web-scraping,beautifulsoup,Python,Html,Css,Web Scraping,Beautifulsoup,如何从下面的HTML代码中提取值“1.00 TK=779.8 我尝试了下面的代码,但它不工作 from bs4 import BeautifulSoup page = requests.get(<url>).text ##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope&

如何从下面的HTML代码中提取值“1.00 TK=779.8

我尝试了下面的代码,但它不工作

from bs4 import BeautifulSoup
page = requests.get(<url>).text

##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())
使用
find_next()
,返回第一个匹配项:

from bs4 import BeautifulSoup

html = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())
编辑:使用:

输出:

1.00 TK = 779.8
1.00 USD = 73.9375 Indian Rupee (INR)
希望它能有所帮助

from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)

我尝试使用id=“driveValue”提取该值,结果是不感谢Samsul。。是否有任何方法可以使用“ID”提取?为什么在OP具有BeautifulSoup标记时使用XML解析?是否可以使用BeautifulSoup?是的,可以使用id进行提取。我使用XML解析,因为它是一个易于使用的库,用于在Python中处理XML和HTML。当我只使用字符串时,它可以正常工作,但在这里我需要阅读页面的内容,获取错误AttributeError:'NoneType'对象没有属性'find_next'@itgeek该页面可能是动态加载的。请参阅使用
selenium
来刮取动态页面。我也尝试过这个方法,打印(soup.find('span',id=“driveValue”))并打印“None”,当我想从字符串“html=''1.00 TK=779.8磁盘驱动器值(DDV)'”中提取值时,您的解决方案会有所帮助。谢谢我知道使用selenium的可能性。我想知道我是否可以不用硒。
1.00 USD = 73.9375 Indian Rupee (INR)
from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)
1.00 TK = 779.8