Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 无法从DuckDuckGo搜索结果中刮取链接_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 无法从DuckDuckGo搜索结果中刮取链接

Python 无法从DuckDuckGo搜索结果中刮取链接,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我想从DuckDuckGo搜索结果中删除第一个链接。我编写了以下代码: import requests from bs4 import BeautifulSoup class Bse: def currentPrice(self,symbol): headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv

我想从DuckDuckGo搜索结果中删除第一个链接。我编写了以下代码:

import requests
from bs4 import BeautifulSoup
class Bse:
      def currentPrice(self,symbol):
            headers = {
                  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0"
            }
            duckDuckUrl=f'https://duckduckgo.com/?q=bse+{symbol}+stock+price'
            response=requests.get(duckDuckUrl,headers=headers)
            soup=BeautifulSoup(response.text,"html.parser")
             bseIndiaLink=soup.find_all('a')
            # bseIndiaLink=soup.find_all('a',class_="result__a")  #giving empty list
            print(bseIndiaLink)


bse=Bse()
bse.currentPrice('reliance')
首先,我在beautifulSoup中使用了find_all(),没有class_参数。它返回了一些随机锚标签的列表,这些标签对我没有任何用处。 我还尝试使用class_u参数查找_all(),但它返回了一个空列表

我试图打印汤对象。它打印的是网页的HTML,而不是那些包含div的结果。我不知道为什么BeautifulSoup不删除包含div的结果。请看屏幕截图,突出显示的HTML语法是我想要的:

我发现DuckDuckGo在搜索结果中使用javascript,而beautifulSoup无法获取javascript,但在StackOverflow的其他帖子中,我发现人们可以从搜索结果中获取链接。
但是如果我使用谷歌而不是DuckDuckGo,我就能够获得所需的链接

我想知道为什么我不能从DuckDuckGo上抓取,而是使用相同的代码从Google上抓取。我很好奇

如果有人知道我遗漏了什么,请告诉我。这将有助于我的学习之旅


谢谢

这将根据您当前的搜索关键字生成结果。您需要发送post http请求以及适当的参数来访问内容。为了使您当前的尝试成功,我在有效负载中使用了一些字符串格式

import requests
from bs4 import BeautifulSoup

class Bse:
    def __init__(self):
        self.duckDuckUrl = 'https://html.duckduckgo.com/html/'
        self.payload = {'q': 'bse {} stock price','b': ''}
        self.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0'}

    def currentPrice(self,symbol):
        self.payload['q'] = self.payload['q'].format(symbol)
        res = requests.post(self.duckDuckUrl,data=self.payload,headers=self.headers)
        soup = BeautifulSoup(res.text,'html.parser')
        return soup.find('a',class_='result__a').get("href")

if __name__ == '__main__':
    bse = Bse()
    print(bse.currentPrice('reliance'))
使用get请求:

link = "https://html.duckduckgo.com/html/?"
params = {'q': 'nse {} stock price'}

def fetch_first_link(s,symbol):
    params['q'] = params['q'].format(symbol)
    res = s.get(link,params=params)
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one(".result__title > a.result__a").get("href")
    return item

if __name__ == '__main__':
    with requests.Session() as s:
        s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
        print(fetch_first_link(s,'reliance'))

尝试此URL@artanik itls向我显示此错误:-requests.exceptions.MissingSchema:无效URL“html.duckduckgo.com/html/?q=nse%20reliance%20stock%20price”:未提供架构。也许你是说?它现在正在工作。但为什么使用post请求?为什么不使用get请求?