如何使用python webscraping从datalayer.push获取数据
我的代码是:如何使用python webscraping从datalayer.push获取数据,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我的代码是: # init scrapy selector response = Selector(text=content) json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1) print(json_dat
# init scrapy selector
response = Selector(text=content)
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
print(json_data)
# debug data extraction logic
HummartScraper.parse_product(HummartScraper, '')
"
输出错误为:
Traceback (most recent call last):
File "hummart2.py", line 86, in parse_product
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
TypeError: findall() missing 1 required positional argument: 'string'
为什么会出现此错误。对于单个
数据层
:
data_layer = response.css('script::text').re_first(r'dataLayer\.push\(([^)]+)')
data = json.loads(data_layer)
您可以使用
response.css(…).re()
获取匹配列表。对于单个数据层
:
data_layer = response.css('script::text').re_first(r'dataLayer\.push\(([^)]+)')
data = json.loads(data_layer)
您可以使用
response.css(…).re()
获取匹配项列表。但这会导致以下类型的错误:
File "hummart2.py", line 88, in parse_product
data = json.loads(data_layer_raw)[1]
File "/home/danish-khan/miniconda3/lib/python3.7/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
但这给了我这种类型的错误:
File "hummart2.py", line 88, in parse_product
data = json.loads(data_layer_raw)[1]
File "/home/danish-khan/miniconda3/lib/python3.7/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
但它给了我这种类型的错误:你有什么错误?文件“hummart2.py”,第88行,在parse_product data=json.loads(data_layer_raw)[1]文件/home/danish khan/miniconda3/lib/python3.7/json/u init_uuuuu.py”,第341行,在loads raise TypeError中(f'JSON对象必须是str,bytes或bytearray,'TypeError:JSON对象必须是str,bytes或bytearray,而不是nonetype看起来你的正则表达式是错误的。你能给我看一个源页面吗?这是源页面:'view source:'但它给我这种类型的错误:你有什么错误?文件“hummart2.py”,第88行,在loads raise TypeError中的第341行,解析产品数据=json.loads(数据层原始)[1]文件“/home/danish khan/miniconda3/lib/python3.7/json/_uuuuuuuuinit_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu(f'JSON对象必须是str、bytes或bytearray,'TypeError:JSON对象必须是str、bytes或bytearray,而不是NoneTypes看起来您的正则表达式是错误的。您能给我看一个源页面吗?这是源页面:'view source:'