Python Beauty Soup |从java脚本中提取变量

Python Beauty Soup |从java脚本中提取变量,python,regex,web-scraping,beautifulsoup,Python,Regex,Web Scraping,Beautifulsoup,打火机 我正在使用BeautifulSoup从一个HTML页面中删除数据,该页面在表体下有几列 请在模拟代码下面: from bs4 import BeautifulSoup import requests import urllib.request as urllib2 import re import json app_page = urllib2.urlopen(myUrl) soup = BeautifulSoup(app_page) print(soup.prettif

打火机

我正在使用BeautifulSoup从一个HTML页面中删除数据,该页面在表体下有几列

请在模拟代码下面:

from bs4 import BeautifulSoup

import requests

import urllib.request as urllib2

import re

import json


app_page = urllib2.urlopen(myUrl)

soup = BeautifulSoup(app_page)

print(soup.prettify())

data  = soup.find_all("script")[7]

data=re.sub("\n","",str(data))

print(data)
输出:


var-appsTableData=[[对文本使用
.string
,然后使用
str.replace

Ex:

data = soup.find_all("script")[7].string 
print(data.replace("var appsTableData=", ""))
[[<"<a href='Something'/>"]]
输出:

data = soup.find_all("script")[7].string 
print(data.replace("var appsTableData=", ""))
[[<"<a href='Something'/>"]]

[[使用beautifulsoup和
重新编译

data = '''<script type="text/javascript">              var appsTableData=[[<"<a href='Something'/>"]]</script>'''
soup = BeautifulSoup(data, "html.parser")

withbs = soup.find('script', string=re.compile('var appsTableData'))
withbs = withbs.text.replace('var appsTableData=', '').strip()
print(withbs)
结果:

[[<"<a href='Something'/>"]]

[[您的问题很难理解。失败的原因是
*?
很懒惰。将其更改为贪婪也不起作用,您需要一些信息来告诉它停止匹配的位置。