用于在Python中提取脚本标记的正则表达式

用于在Python中提取脚本标记的正则表达式,python,regex,web-scraping,Python,Regex,Web Scraping,我有以下Python代码: 导入系统、操作系统、请求、日期时间、时间 从bs4导入BeautifulSoup 导入urllib.request 进口稀土 导入json def get_html(url): headers={'User-Agent':'Mozilla/5.0(Macintosh;英特尔Mac OS X 10_9_3)AppleWebKit/537.36(KHTML,像Gecko)Chrome/35.0.1916.47 Safari/537.36'} r=requests.get

我有以下Python代码:

导入系统、操作系统、请求、日期时间、时间
从bs4导入BeautifulSoup
导入urllib.request
进口稀土
导入json
def get_html(url):
headers={'User-Agent':'Mozilla/5.0(Macintosh;英特尔Mac OS X 10_9_3)AppleWebKit/537.36(KHTML,像Gecko)Chrome/35.0.1916.47 Safari/537.36'}
r=requests.get(url,headers=headers)
返回r.content
链接https://www.clubx.com.au/products/womanizer-pro?variant=37834367948'
soup=BeautifulSoup(获取html(链接),'html.parser')
obj=soup.find_all('script')[18]
m=re.search(r“\”变体\“:\[(.*?\]),obj.string)
如果m:
data=json.load(m.group(1))
打印(数据)
使用正则表达式模式
r“\”变体\“:\[(.*?\])”

演示:

from bs4 import BeautifulSoup
import json
import re

s = """<script>var BOLD = BOLD || {};
    BOLD.products = BOLD.products || {};
    BOLD.variant_lookup = BOLD.variant_lookup || {};BOLD.variant_lookup[31066737740] ="womanizer";BOLD.variant_lookup[31066737804] ="womanizer";BOLD.variant_lookup[31066737868] ="womanizer";BOLD.variant_lookup[31066737996] ="womanizer";BOLD.variant_lookup[1509908217881] ="womanizer";BOLD.products["womanizer"] ={"id":8993669708,"title":"Womanizer","variants":[{"id":37834367948,"title":"Black","option1":"Black","option2":null,"option3":null,"sku":"1725205212"}]}
    </script>
"""

soup = BeautifulSoup(s, "html.parser")
src = soup.find("script")
m = re.search(r"\"variants\":\[(.*?)\]", src.string)
if m:
    data = json.loads(m.group(1))
    print(data)
{u'sku': u'1725205212', u'title': u'Black', u'id': 37834367948L, u'option2': None, u'option3': None, u'option1': u'Black'}

当's'是字符串类型时,它就工作了。但在我的情况下,我有“s”。因此发生了错误:json.decoder.jsondecodecor:Expecting','分隔符:第1行第495列(char 494)我的代码有以下结构:def get_html(url):headers={'User-Agent':'Mozilla/5.0(Macintosh;Intel Mac OS X 10_9_3)AppleWebKit/537.36(KHTML,像Gecko)Chrome/35.0.1916.47 Safari/537.36'}=requests.get(url,headers=headers)返回r.content link=''soup=BeautifulSoup(get_html(link),'html.parser')obj=soup.find_all('script')[18]m=re.search(r“\”variants\:[(.*?)”,obj.string)如果m:data=json.loads(m.group(1))打印(data)什么是
obj.string
print?它打印内容在您的问题中添加它。