Python 使用正则表达式(和请求)在跨距后刮取文本

Python 使用正则表达式(和请求)在跨距后刮取文本,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我有一个网页中未格式化且凌乱的bs4.beautifulsou元素。汤如下所示 soup='Jul 30,2021“},{“id”:“50014999”,“description”:null,“displayValue”:“M”,“value”:“M”,“selected”:false,“selectable”:true,“url”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwv

我有一个网页中未格式化且凌乱的
bs4.beautifulsou
元素。
如下所示

soup='Jul 30,2021“},{“id”:“50014999”,“description”:null,“displayValue”:“M”,“value”:“M”,“selected”:false,“selectable”:true,“url”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=M&pid=2947&quantity=1“很快就来了”:没错,“很快就来了”:true,“配置URL”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=M&dwvar_2947_pv_rahmenfarbe=YE%2fBK“,“sizeMin”:178,“sizeMax”:184,“measurementInterval”:“178厘米-184厘米”,“即将到来的原因”:“ProductorReferenceInstockDate”,“即将到来的原因”:true,“可用性”:{“消息”:[“延期订单”],“inStockDate”:“2021-08-09T00:00:00.000Z”,“onlyXLeftNumber”:122,“onlyXLeft”:false,“lowStock”:false,“shippingInfo”:“2021年8月到来”,“available”:false,“available”:false,“notifyMe”:true,“showOutOfStock”:false,“similarBikes”:false,“ComingSoonBackOrderAllocation”:false},“hasSuccessorProduct”:false,“comingSoonMessage”:”:将于2021年8月到来“},
{“id”:“50015000”,“description”:null,“displayValue”:“L”,“value”:“L”,“selected”:false,“selectable”:true,“url”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=L&pid=2947&quantity=1,“hasComingSoon”:true,“hasAllComingSoonAttr”:true,“configurationUrl”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=L&dwvar_2947_pv_rahmenfarbe=YE%2fBK“,“sizeMin”:184,“sizeMax”:190,“measurementInterval”:“184厘米-190厘米”,“即将推出原因”:“ProductorReferenceInstockDate”,“即将推出”:true,“可用性”:{“消息”:[“延期订单”],“inStockDate”:”2021-08-16T00:00:00.000Z,“onlyxlefnumber”:96,“onlyxlefit”:false,“lowStock”:false,“shippingInfo”:“2021年8月到来”,“available”:false,“availables”ufficient:true,“notifyMe”:true,“showOutOfStock”:false,“similarBikes”:false,“comingsoonbackorderallocation”:false},“hasSuccessorProduct”:false,“comingSoonMessage”:“2021年8月到来”},
{“id”:“50015001”,“description”:null,“displayValue”:“XL”,“value”:“XL”,“selected”:false,“selectable”:true,“url”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=XL&pid=2947&quantity=1,“hasComingSoon”:true,“hasAllComingSoonAttr”:true,“configurationUrl”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK,“sizeMin”:190,“sizeMax”:196,“measurementInterval”:“190厘米-196厘米”,“即将推出原因”:“ProductorReferenceInstockDate”,“即将推出”:true,“可用性”:{“消息”:[“延期订单”],“inStockDate”:2021-08-09T00:00:00.000Z,“onlyxlefnumber”:38,“onlyxlefit”:false,“lowStock”:false,“shippingInfo”:“2021年8月到来”,“available”:false,“availables”ufficient:true,“notifyMe”:true,“showOutOfStock”:false,“similarBikes”:false,“comingsoonbackorderallocation”:false},“hasSuccessorProduct”:false,“comingSoonMessage”:“2021年8月到来”},
{“id”:“50015002”,“description”:null,“displayValue”:“2XL”,“value”:“2XL”,“selected”:false,“selectable”:true,“url”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=2XL&pid=2947&quantity=1,“hasComingSoon”:false,“hasAllComingSoonAttr”:false,“configurationUrl”:https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=2XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK“,“sizeMin”:196,“sizeMax”:999,“measurementInterval”:“>196 cm”,“comingSoon原因”:““comingSoon”:false,“availability”:{“messages”:[“Back order”],“inStockDate”:“2021-07-26T00:00:00.000Z”onlyxlefnumber:10,“onlyXLeft”:false,“lowStock”:false,“shippingInfo”:“Shipping Jul 26,2021-Jul 30”,“available”:true,“availableSufficient”:true,“notifyMe”:false,“Showoutof Stock”:false,“similarBikes”:false,“comingSoonByBackOrderAllocation”:false},“hasSuccessorProduct”:false,“comingSoonMessage”:“Shipping Jul 26,2021-Jul 30”}”重置URL:“”https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=&pid=2947&quantity=1“,“hasSelectedValue”:false,“isLastAttributeOnPDP”:true,“colorAttribute”:false,“sizeAttribute”:true,“buttonAttribute”:false,“damagedAttribute”:false}]};”
我需要
span class=\productConfiguration\uuu shippingDateEnd
后面的元素,即“id”字典,以便在搜索之后可以得到类似的内容

{"id":"50015002","description":null,"displayValue":"2XL","value":"2XL","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=2XL&pid=2947&quantity=1","hasComingSoon":false,"hasAllComingSoonAttr":false,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=2XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":196,"sizeMax":999,"measurementInterval":"> 196 cm","comingSoonReason":"","comingSoon":false,"availability":{"messages":["Back order"],"inStockDate":"2021-07-26T00:00:00.000Z","onlyXLeftNumber":10,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Shipping}' 
如果我做了
soup1.find_all('span',class='productConfiguration\u\u shippingDateEnd')
我只得到这个结果。另外
。next\u同胞
不会返回任何内容

[2021年7月30日,
[2021年7月30日,
你知道我在这里怎么走吗


非常感谢您的帮助。

我看到的内容与图中所示略有不同,但包含大小不同的股票信息。您可以使用正则表达式提取字符串,然后使用json将字符串转换为json对象

import requests, re, json
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.canyon.com/en-de/road-bikes/endurance-bikes/endurace/cf-sl/endurace-cf-sl-7-disc/2947.html?dwvar_2947_pv_rahmenfarbe=YE%2FBK')
s = re.search(r'window\.deptsfra=(.*);', r.text).group(1)
#print(s)
data = json.loads(s)
print(data)

from pprint import pprint

pprint(data['productDetail']['variationAttributes'][1]['values'])

for i in data['productDetail']['variationAttributes'][1]['values']:
    print(i['value'], i['availability'])

表中以dict形式显示的值:

results = {i['value']: (bs(i['availability']['shippingInfo']).get_text() if '<' in i['availability']['shippingInfo'] else i['availability']['shippingInfo']) for i in data['productDetail']['variationAttributes'][1]['values']}

results={i['value']:(bs(i['availability']['shippingInfo'])。如果“你能分享到网页的链接吗?”@QHarr例如,寻找产品尺寸、XL、XS等尺寸的可用性。你使用了哪种服务来解释regex?@EmmanuelMtali@QHarr这里非常使用regex。非常感谢。