Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python获取Javascript变量的值_Python_Web Scraping - Fatal编程技术网

Python获取Javascript变量的值

Python获取Javascript变量的值,python,web-scraping,Python,Web Scraping,我正在抓取instagram页面()并获取脚本(HTML和一些javascript)。结果是这样的 <script>some script</script> <script>some script</script> <script>some script</script> <script>window._sharedData = {"config":{"csrf_token"

我正在抓取instagram页面()并获取脚本(HTML和一些javascript)。结果是这样的

<script>some script</script>
<script>some script</script>
<script>some script</script>
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
import urllib.request
import json
import re
from bs4 import BeautifulSoup

web = urllib.request.urlopen("https://instagram.com/celmirashop")
soup = BeautifulSoup(web.read(), 'lxml')
pattern = re.compile(r"window._sharedData = .")
script = soup.find("script",text=pattern)
print(script)
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
给我一个结果,一个我想要的特定javascript。像这样

<script>some script</script>
<script>some script</script>
<script>some script</script>
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
import urllib.request
import json
import re
from bs4 import BeautifulSoup

web = urllib.request.urlopen("https://instagram.com/celmirashop")
soup = BeautifulSoup(web.read(), 'lxml')
pattern = re.compile(r"window._sharedData = .")
script = soup.find("script",text=pattern)
print(script)
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
window._sharedData={“config”:{“csrf_令牌”:“ssqrj6c8tfn1hwoilwmpqont2baptnu”,“查看器”:null等。。。。

如何获取window.\u sharedData?的值并循环它。因为我想在mysql中保存,假设以;结束,并且只发生一次,您可以在response.text上使用以下正则表达式模式

import re

s = '''<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null"};</script>'''
p = re.compile(r'window\._sharedData = (.*);')
print(p.findall(s)[0])
重新导入
s=''window.\u sharedData={“配置”:{“csrf_令牌”:“SSQRJ6C8TFN1HWOILWMPQONT2BAPTNU”,“查看器”:null”};''
p=re.compile(r'window\.\u sharedData=(.*);)
印刷品(p.findall[0])

假设以;结束,并且仅在您可以在response.text上使用以下正则表达式模式时发生

import re

s = '''<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null"};</script>'''
p = re.compile(r'window\._sharedData = (.*);')
print(p.findall(s)[0])
重新导入
s=''window.\u sharedData={“配置”:{“csrf_令牌”:“SSQRJ6C8TFN1HWOILWMPQONT2BAPTNU”,“查看器”:null”};''
p=re.compile(r'window\.\u sharedData=(.*);)
印刷品(p.findall[0])
以下是一种方法:

>>> xxx = '''
... <script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
... '''
这里有一个方法:

>>> xxx = '''
... <script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
... '''
return JSON.stringify(window.\u sharedData)nice+return JSON.stringify(window.\u sharedData)nice+