Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从脚本中删除_Python_Python 3.x - Fatal编程技术网

Python 从脚本中删除

Python 从脚本中删除,python,python-3.x,Python,Python 3.x,我正在尝试使用python的BeautifulSoup提取公司使用的语言比例 然而,这些信息似乎来自脚本,而不是HTML,我遇到了一些麻烦 例如,在下一页中,当我尝试 webpage ="https://www.zippia.com/amazon-com-careers-487/" page = requests.get(webpage) soup = BeautifulSoup(page.content, 'lxml') for links in soup.find_all('div', {

我正在尝试使用python的BeautifulSoup提取公司使用的语言比例

然而,这些信息似乎来自脚本,而不是HTML,我遇到了一些麻烦

例如,在下一页中,当我尝试

webpage ="https://www.zippia.com/amazon-com-careers-487/"
page = requests.get(webpage)
soup = BeautifulSoup(page.content, 'lxml')

for links in soup.find_all('div', {'class':'companyEducationDegrees'}):
    raw_text = links.get_text()
    lines = raw_text.split('\n')
    print(lines)
    print('-------------------')

我没有得到任何结果,而理想的结果应该是西班牙语61.1%,法语9.7%,等等

,因为您已经发现数据是通过JS输入页面的。但是,您仍然可以获取该数据,因为comapany上的整个数据始终与页面一起加载。您可以通过requests+BeautifulSoup+json+re访问此数据:


你得到的结果是什么?[,{[{language.name}]}{[{language.percentage}]}%',,,]------------[,{[{degree.name}]}{[{degree.percentage}]}%',,,,]------------这是我想要的got@hard1009数据是由Javascript加载的,所以您必须使用一些东西来模拟浏览器,比如PhantonJS
import json
import re

import requests
from bs4 import BeautifulSoup

webpage = "https://www.zippia.com/amazon-com-careers-487/"
page = requests.get(webpage)
soup = BeautifulSoup(page.content, 'lxml')

for script in soup.find_all('script', {'type': 'text/javascript'}):
    if 'getCompanyInfo' in script.text:
        match = re.search("{[^\n]*}", script.text)
        data = json.loads(match.group())
        print(data["companyDiversity"]["languages"])

        json.dump(data, open("test.json", "w"), indent=2) # Only if you want the data put in a readable format to a file (like if you want to find the path to an entry)