Python获取Javascript变量的值
我正在抓取instagram页面()并获取脚本(HTML和一些javascript)。结果是这样的Python获取Javascript变量的值,python,web-scraping,Python,Web Scraping,我正在抓取instagram页面()并获取脚本(HTML和一些javascript)。结果是这样的 <script>some script</script> <script>some script</script> <script>some script</script> <script>window._sharedData = {"config":{"csrf_token"
<script>some script</script>
<script>some script</script>
<script>some script</script>
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
import urllib.request
import json
import re
from bs4 import BeautifulSoup
web = urllib.request.urlopen("https://instagram.com/celmirashop")
soup = BeautifulSoup(web.read(), 'lxml')
pattern = re.compile(r"window._sharedData = .")
script = soup.find("script",text=pattern)
print(script)
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
给我一个结果,一个我想要的特定javascript。像这样
<script>some script</script>
<script>some script</script>
<script>some script</script>
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
import urllib.request
import json
import re
from bs4 import BeautifulSoup
web = urllib.request.urlopen("https://instagram.com/celmirashop")
soup = BeautifulSoup(web.read(), 'lxml')
pattern = re.compile(r"window._sharedData = .")
script = soup.find("script",text=pattern)
print(script)
<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
window._sharedData={“config”:{“csrf_令牌”:“ssqrj6c8tfn1hwoilwmpqont2baptnu”,“查看器”:null等。。。。
如何获取window.\u sharedData?的值并循环它。因为我想在mysql中保存,假设以;结束,并且只发生一次,您可以在response.text上使用以下正则表达式模式
import re
s = '''<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null"};</script>'''
p = re.compile(r'window\._sharedData = (.*);')
print(p.findall(s)[0])
重新导入
s=''window.\u sharedData={“配置”:{“csrf_令牌”:“SSQRJ6C8TFN1HWOILWMPQONT2BAPTNU”,“查看器”:null”};''
p=re.compile(r'window\.\u sharedData=(.*);)
印刷品(p.findall[0])
假设以;结束,并且仅在您可以在response.text上使用以下正则表达式模式时发生
import re
s = '''<script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null"};</script>'''
p = re.compile(r'window\._sharedData = (.*);')
print(p.findall(s)[0])
重新导入
s=''window.\u sharedData={“配置”:{“csrf_令牌”:“SSQRJ6C8TFN1HWOILWMPQONT2BAPTNU”,“查看器”:null”};''
p=re.compile(r'window\.\u sharedData=(.*);)
印刷品(p.findall[0])
以下是一种方法:
>>> xxx = '''
... <script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
... '''
这里有一个方法:
>>> xxx = '''
... <script>window._sharedData = {"config":{"csrf_token":"sSqrj6c8tfN1HwOIlwmpqONT2bAPhtNu","viewer":null etc....</script>
... '''
return JSON.stringify(window.\u sharedData)nice+return JSON.stringify(window.\u sharedData)nice+