Python 从Ghost.py文件获取信息_Python_Ghost.py

Python 从Ghost.py文件获取信息

python

Python 从Ghost.py文件获取信息,python,ghost.py,Python,Ghost.py,我正在做一个项目，我需要从网页上获取信息。我正在使用python和ghost。我在文档中看到了以下代码： links = gh.evaluate(""" var linksobj = document.querySelectorAll("a"); var links = []; for (var i=0; i<linksobj.length; i++){

我正在做一个项目，我需要从网页上获取信息。我正在使用python和ghost。我在文档中看到了以下代码：

links = gh.evaluate("""
                    var linksobj = document.querySelectorAll("a");
                    var links = [];
                    for (var i=0; i<linksobj.length; i++){
                        links.push(linksobj[i].value);
                    }
                    links;
                """)

links=gh.evaluate（“”）
var linksobj=document.queryselectoral（“a”）；
var-links=[]；
for（var i=0；iEdit：在看了Padraic Cunningham的答案后，我很遗憾地误解了你的问题。我如何将我的答案留给将来的参考或可能的反对票。：p
如果您收到的输出是字符串，则使用python中的常见字符串操作来实现您在问题中提到的所需输出
您将收到：title>这是网页的标题

您需要：这是网页的标题

假设您收到的输出总是以相同的格式，那么您可以执行以下字符串操作以获得所需的输出。
使用操作：
>>> s = 'title>this is title of the webpage'
>>> p = s.split('>')
>>> p
 ['title', 'this is title of the webpage']
>>> p[1]
'this is title of the webpage'

这里的p
是一个列表，因此您必须访问包含所需输出的适当元素
或者更简单的方法是生成子字符串
>>> s = 'title>this is title of the webpage'
>>> p = s[6:]
>>> p
'this is title of the webpage'

p=s[6:][/code>在上面的代码片段中，表示您需要一个字符串，该字符串包含title>的所有内容。这是网页的标题，从第7个元素开始到结尾。换句话说，您忽略了第一个6
元素
如果您收到的输出不总是相同的格式，那么您可能更喜欢使用
您的第二个问题已经在评论部分得到了回答。
希望我正确理解了您的问题。
使用和
是一个webkit客户端。它允许您加载网页并与其DOM和运行时进行交互
这意味着，一旦您安装并运行了所有内容，您只需执行以下操作：
from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')
if page.http_status == 200:
    result, extra = ghost.evaluate('document.title;')
    print('The title is: {}'.format(result))

如果您收到的输出是一个字符串，那么我认为您应该看看python中常见的字符串操作。您可以使用一个字符串进行剥离、拆分和许多操作javascript@PadraicCunningham你的分析对我来说也是正确的。我想从string中获取string。当我发布问题时，我试图像这样发布它，但没有成功自动跳过标题标记。@user1934948我的答案获取字符串。我不知道这如何解决此人的问题，因为他没有使用请求或美化组。@BurhanKhalid，他正在使用python，而这两者都可用。可能OP不知道它们，或者我们假设每个SO成员都知道所有LIB吗？
>>> s = 'title>this is title of the webpage'
>>> p = s[6:]
>>> p
'this is title of the webpage'

import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.google.com/")
soup = BeautifulSoup(r.text)
soup.title.string
In [3]: soup.title.string
Out[3]: u'Google'

from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')
if page.http_status == 200:
    result, extra = ghost.evaluate('document.title;')
    print('The title is: {}'.format(result))