Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/279.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从HTML字符串中提取内容_Python_Scrapy - Fatal编程技术网

Python 如何从HTML字符串中提取内容

Python 如何从HTML字符串中提取内容,python,scrapy,Python,Scrapy,我想从DIV标签中提取内容。我正在使用scrapy来废弃一些站点,但问题是同一个DIV标签有两种类型的内容: ["<div class=\"price\">\n <s>Rs.330</s> <b>Rs.297</b>\n </div>"] [“\n Rs.330 Rs.297\n”] 及 [“\n Rs.330\n”] 如何从该标签

我想从DIV标签中提取内容。我正在使用scrapy来废弃一些站点,但问题是同一个DIV标签有两种类型的内容:

["<div class=\"price\">\n                <s>Rs.330</s> <b>Rs.297</b>\n                              </div>"]
[“\n Rs.330 Rs.297\n”]

[“\n Rs.330\n”]
如何从该标签中提取内容?

使用:

导入bs4
html=“\n Rs.330 Rs.297\n”
soup=bs4.BeautifulSoup(html,features=“xml”)
s=soup.div.s.text#u'Rs.330'
b=soup.div.b.text#u'Rs.297'

Scrapy使用XPath进行报废尝试
/div[contains(@class,'price')]/s/@text
是的,我尝试过,但同一个div对不同的产品有两种类型的内容……问题是div对物品有价格或者有时间,所以我如何区分它们……这就是为什么我在寻找其他东西。。。
["<div class=\"price\">\n                Rs.330              \n</div>"] 
import bs4

html = "<div class=\"price\">\n                <s>Rs.330</s> <b>Rs.297</b>\n                              </div>"
soup = bs4.BeautifulSoup(html, features="xml")
s = soup.div.s.text # u'Rs.330'
b = soup.div.b.text # u'Rs.297'