Python 如何在<;脚本>;标签

Python 如何在<;脚本>;标签,python,html,selenium,web-scraping,beautifulsoup,Python,Html,Selenium,Web Scraping,Beautifulsoup,我在刮 部分源代码是 <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "Product", "name": "Flip Sequin Teach & Inspire Graphic Tee", "image": [ "http://lanebryant.scene7.com/is/image/lanebryantProdATG/356861_0000015477"

我在刮

部分源代码是

<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "Flip Sequin Teach & Inspire Graphic Tee",
"image": [
"http://lanebryant.scene7.com/is/image/lanebryantProdATG/356861_0000015477",
"http://lanebryant.scene7.com/is/image/lanebryantProdATG/356861_0000015477_Back"
],
"description": "Get inspired with [...]",
"brand": "Lane Bryant",
"sku": "356861",
"offers": {
"@type": "Offer",
"url": "https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861",
"priceCurrency": "USD",
"price":"44.95",
"availability": "http://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition"
}
}
}
}
</script>
起点在哪里:

        d = webdriver.Chrome('/Users/fatima.arshad/Downloads/chromedriver')
        d.get(url)
        start = BeautifulSoup(d.page_source, 'html.parser')
即使我得到正确的文本,它也不会打印价格。我怎样才能得到公正的价格

price1 = start.find('script', {'type': 'application/ld+json'})
这实际上是
标记,因此最好使用更好的名称

script_tag = start.find('script', {'type': 'application/ld+json'})
您可以使用
.text
访问脚本标记内的文本。在本例中,这将为您提供JSON

json_string = script_tag.text
使用JSON解析器避免误解,而不是用逗号拆分:

import json    
clothing=json.loads(json_string)

在这种情况下,您可以只使用regex来计算价格

import requests, re

r = requests.get('https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861#color/0000015477', headers = {'User-Agent':'Mozilla/5.0'})
p = re.compile(r'"price":"(.*?)"')
print(p.findall(r.text)[0])
否则,按id将适当的脚本标记作为目标,然后使用json库解析.text

import requests, json
from bs4 import BeautifulSoup 

r = requests.get('https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861#color/0000015477', headers = {'User-Agent':'Mozilla/5.0'})
start = BeautifulSoup(r.text, 'html.parser')
data = json.loads(start.select_one('#pdpInitialData').text)
price = data['pdpDetail']['product'][0]['price_range']['sale_price']
print(price)

给出此错误:json.decoder.JSONDecodeError:第10行第316列(char 571)处的控制字符无效。请用一点outerHTML更新这个问题好吗?
import requests, json
from bs4 import BeautifulSoup 

r = requests.get('https://www.lanebryant.com/flip-sequin-teach-inspire-graphic-tee/prd-356861#color/0000015477', headers = {'User-Agent':'Mozilla/5.0'})
start = BeautifulSoup(r.text, 'html.parser')
data = json.loads(start.select_one('#pdpInitialData').text)
price = data['pdpDetail']['product'][0]['price_range']['sale_price']
print(price)