Python duckduckgo API未返回结果_Python_Api_Parsing_Search

Python duckduckgo API未返回结果

python api parsing search

Python duckduckgo API未返回结果,python,api,parsing,search,Python,Api,Parsing,Search,编辑我现在意识到API根本不够，甚至不起作用。我想重定向我的问题，我想能够自动神奇地搜索duckduckgo使用他们的“我觉得ducky”。例如，我可以搜索“stackoverflow”，并获得主页（“”）我正在使用duckduckgo API 我发现在使用时： r = duckduckgo.query("example") 结果不反映手动搜索，即： for result in r.results: print result 结果： >>> >>&

编辑我现在意识到API根本不够，甚至不起作用。我想重定向我的问题，我想能够自动神奇地搜索duckduckgo使用他们的“我觉得ducky”。例如，我可以搜索“stackoverflow”，并获得主页（“”）

我正在使用duckduckgo API

我发现在使用时：

r = duckduckgo.query("example")

结果不反映手动搜索，即：

for result in r.results:
    print result

结果：

>>> 
>>>

没什么

在

results

中查找索引会导致越界错误，因为它是空的

我应该如何获得搜索结果

似乎API（根据其记录的示例）应该回答问题，并以

r.answer.text

但是这个网站是以这样一种方式制作的，我不能用普通的方法搜索和解析结果

我想知道我应该如何用这个API或这个站点的任何其他方法解析搜索结果

谢谢。

试试：

for result in r.results:
    print result.text

如果您访问，您会发现一些关于使用API的说明。第一个注释明确指出：

由于这是一个零点击信息API，大多数深度查询（非主题名称）将是空白的

a以下是这些字段的列表：

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

所以这可能是一个遗憾，但他们的API只是截断了一堆结果，并没有将它们提供给您；可能是为了更快地工作，而且似乎除了使用，什么也做不了

因此，很明显，在这种情况下，API并不是一条好路

至于我，我只看到了一条出路：从中检索原始html并使用解析，例如（值得一提的是，他们的html结构良好）

还值得一提的是，解析html页面并不是丢弃数据的最可靠方式，因为html结构可能会发生变化，而API通常会保持稳定，直到公开宣布变化

下面是如何通过以下方法实现此类解析的示例：

此脚本打印：

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

在主页上直接查询的问题是，它使用JavaScript生成所需的结果（与主题无关），因此您只能使用HTML版本获取结果。HTML版本有不同的链接：

#JavaScript版本
#仅HTML版本

让我们看看我们能得到什么：

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

存储在

first_link

变量中的结果是指向搜索引擎输出的第一个结果（非相关搜索）的链接：

要获得所有链接，您可以在找到的标记上进行迭代（除了链接之外的其他数据可以以类似的方式接收）

请注意，仅HTML版本仅包含结果，对于相关搜索，必须使用JavaScript版本。（不包含url中的html部分）。

如果它适合您的应用程序，您也可以尝试相关搜索

r = duckduckgo.query("example")
for i in r.related_searches:
    if i.text:
        print i.text

这将产生：

Eixample, an inner suburb of Barcelona with distinctive architecture
Example (musician), a British musician
example.com, example.net, example.org, example.edu  and .example, domain names reserved for use in documentation as examples
HMS Example (P165), an Archer-class patrol and training vessel of the British Royal Navy
The Example, a 1634 play by James Shirley
The Example (comics), a 2009 graphic novel by Tom Taylor and Colin Wilson

在已经得到了我的问题的答案，我接受了这个答案并给予了奖励之后，我找到了一个不同的解决方案，为了完整起见，我想在这里添加这个解决方案。非常感谢所有帮助我达成这个解决方案的人。尽管这不是我要求的解决方案，但它可能会在将来帮助某些人

在本网站上进行了长时间的艰苦对话并发送了一些支持邮件后发现：

下面是解决方案代码（来自上面帖子中的答案）：

对于python 3用户，@Rostyslav Dzinko代码的转录：

import re, urllib
import pandas as pd
from bs4 import BeautifulSoup

query = "your query"
site = urllib.request.urlopen("http://duckduckgo.com/html/?q="+query)
data = site.read()
soup = BeautifulSoup(data, "html.parser")

my_list = soup.find("div", {"id": "links"}).find_all("div", {'class': re.compile('.*web-result*.')})[0:15]


(result__snippet, result_url) = ([] for i in range(2))

for i in my_list:         
      try:
            result__snippet.append(i.find("a", {"class": "result__snippet"}).get_text().strip("\n").strip())
      except:
            result__snippet.append(None)
      try:
            result_url.append(i.find("a", {"class": "result__url"}).get_text().strip("\n").strip())
      except:
            result_url.append(None)

同样的结果，什么都没有。问题是r.results是一个空数组，API根本不返回任何结果。r.related返回相关的搜索/查询，这不是我想要得到的。。。即使在某些情况下，它可能是有用的。显然，如果您尝试，这是一种“管道胶带解决方案”：您也会得到空结果。没错，但显然我的代码没有搜索“示例”，大多数其他内容也不会返回任何结果。谢谢。这有助于我了解问题所在，你在哪里找到的P我试着为duckduckgo的常规html页面编写解析器，但我遇到了问题，因为它使用java或其他东西，结果没有以正确的html格式显示出来……它对我使用BeautifulSoup很好。将更新答案哦，那是错误的，你得到的结果来自相关搜索。这只是一个例子，说明页面是一致的HTML，你可以这样做来获得所有其他结果。因此，使用HTML页面，我可以得到不止一个结果吗？链接似乎是死的是的，似乎是这样。抱歉-我在这里发布的帖子的要点是什么。其余的大部分只是关于这些问题的反复讨论。

Eixample, an inner suburb of Barcelona with distinctive architecture
Example (musician), a British musician
example.com, example.net, example.org, example.edu  and .example, domain names reserved for use in documentation as examples
HMS Example (P165), an Archer-class patrol and training vessel of the British Royal Navy
The Example, a 1634 play by James Shirley
The Example (comics), a 2009 graphic novel by Tom Taylor and Colin Wilson

>>> import duckduckgo
>>> print duckduckgo.query('! Example').redirect.url
http://www.iana.org/domains/example

import re, urllib
import pandas as pd
from bs4 import BeautifulSoup

query = "your query"
site = urllib.request.urlopen("http://duckduckgo.com/html/?q="+query)
data = site.read()
soup = BeautifulSoup(data, "html.parser")

my_list = soup.find("div", {"id": "links"}).find_all("div", {'class': re.compile('.*web-result*.')})[0:15]


(result__snippet, result_url) = ([] for i in range(2))

for i in my_list:         
      try:
            result__snippet.append(i.find("a", {"class": "result__snippet"}).get_text().strip("\n").strip())
      except:
            result__snippet.append(None)
      try:
            result_url.append(i.find("a", {"class": "result__url"}).get_text().strip("\n").strip())
      except:
            result_url.append(None)