Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/heroku/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript drysrape安装Ubuntu服务器16.04_Javascript_Python_Ubuntu_Web Scraping_Dryscrape - Fatal编程技术网

Javascript drysrape安装Ubuntu服务器16.04

Javascript drysrape安装Ubuntu服务器16.04,javascript,python,ubuntu,web-scraping,dryscrape,Javascript,Python,Ubuntu,Web Scraping,Dryscrape,我在ubuntu 16.04服务器(digital ocean上的干净安装)上实现DrySrape时遇到了问题——目标是删除JS填充的网站 我遵循以下Drysrape安装说明: 然后运行下面的python脚本,我在同一个链接中找到了这个脚本以及测试html页面。(返回html或JS) Python import dryscrape from bs4 import BeautifulSoup session = dryscrape.Session() my_url = 'http://www.ex

我在ubuntu 16.04服务器(digital ocean上的干净安装)上实现DrySrape时遇到了问题——目标是删除JS填充的网站

我遵循以下Drysrape安装说明:

然后运行下面的python脚本,我在同一个链接中找到了这个脚本以及测试html页面。(返回html或JS)

Python

import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
my_url = 'http://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
soup.find(id="intro-text")
HTML-scrape.php

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Javascript scraping test</title>
</head>
<body>
  <p id='intro-text'>No javascript support</p>
  <script>
     document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
  </script> 
</body>
</html>

您没有运行X服务器。线索是

在创建会话之前,尝试调用drysrave.start_xvfb()

xvfb_389;(只有在没有其他X服务器可用时才需要)

因此,您可以添加:

dryscrape.start_xvfb()
之前:

session = dryscrape.Session()

为此,我在回答的底部添加了一个更新/工作的python脚本。我需要添加的唯一附加内容是在
soup=BeautifulSoup(响应,“html.parser”)
中指定html解析器,非常感谢您的帮助,因为我昨天花了4个小时阅读并试图解决问题。
import dryscrape
from bs4 import BeautifulSoup

dryscrape.start_xvfb()
session = dryscrape.Session()
my_url = 'https://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response, "html.parser")
print soup.find(id="intro-text").text
if 'linux' in sys.platform:
    # start xvfb in case no X is running. Make sure xvfb 
    # is installed, otherwise this won't work!
    dryscrape.start_xvfb()
dryscrape.start_xvfb()
session = dryscrape.Session()