Python Beauty Soup |从java脚本中提取变量
打火机 我正在使用BeautifulSoup从一个HTML页面中删除数据,该页面在表体下有几列 请在模拟代码下面:Python Beauty Soup |从java脚本中提取变量,python,regex,web-scraping,beautifulsoup,Python,Regex,Web Scraping,Beautifulsoup,打火机 我正在使用BeautifulSoup从一个HTML页面中删除数据,该页面在表体下有几列 请在模拟代码下面: from bs4 import BeautifulSoup import requests import urllib.request as urllib2 import re import json app_page = urllib2.urlopen(myUrl) soup = BeautifulSoup(app_page) print(soup.prettif
from bs4 import BeautifulSoup
import requests
import urllib.request as urllib2
import re
import json
app_page = urllib2.urlopen(myUrl)
soup = BeautifulSoup(app_page)
print(soup.prettify())
data = soup.find_all("script")[7]
data=re.sub("\n","",str(data))
print(data)
输出:
var-appsTableData=[[对文本使用.string
,然后使用str.replace
Ex:
data = soup.find_all("script")[7].string
print(data.replace("var appsTableData=", ""))
[[<"<a href='Something'/>"]]
输出:
data = soup.find_all("script")[7].string
print(data.replace("var appsTableData=", ""))
[[<"<a href='Something'/>"]]
[[使用beautifulsoup和重新编译
data = '''<script type="text/javascript"> var appsTableData=[[<"<a href='Something'/>"]]</script>'''
soup = BeautifulSoup(data, "html.parser")
withbs = soup.find('script', string=re.compile('var appsTableData'))
withbs = withbs.text.replace('var appsTableData=', '').strip()
print(withbs)
结果:
[[<"<a href='Something'/>"]]
[[您的问题很难理解。失败的原因是*?
很懒惰。将其更改为贪婪也不起作用,您需要一些信息来告诉它停止匹配的位置。