Express 如何避免刮削goodreads时出现403响应?

Express 如何避免刮削goodreads时出现403响应?,express,asynchronous,xpath,web-scraping,cheerio,Express,Asynchronous,Xpath,Web Scraping,Cheerio,我正在使用cheerio废弃goodreads.com示例页面:。代码如下。代码最初是有效的,我从服务器获得了所需的响应。然而,在两次成功的尝试之后,goodreads的http响应返回403禁止。几分钟后,禁止的反应消失了。有没有办法修复这种行为?多个请求的速率限制 require('dotenv').config(); const cheerio = require('cheerio'); const axios = require('axios'); var xpath = requir

我正在使用cheerio废弃goodreads.com示例页面:。代码如下。代码最初是有效的,我从服务器获得了所需的响应。然而,在两次成功的尝试之后,goodreads的http响应返回403禁止。几分钟后,禁止的反应消失了。有没有办法修复这种行为?多个请求的速率限制

require('dotenv').config();
const cheerio = require('cheerio');
const axios = require('axios');

var xpath = require('xpath'),
  dom = require('xmldom').DOMParser;

module.exports = {
  async searchBooks(req, res, next) {
    const booksArray = [];
    const { searchTerm, searchField } = req.body;
    axios
      .get('https://www.goodreads.com/search/index.xml', {
        params: {
          q: searchTerm,
          page: null,
          key: process.env.GOODREADS_KEY,
          'search[field]': searchField
        }
      })
      .then(result => {
        const xml = result.data;
        var doc = new dom().parseFromString(xml);
        const idsArr = xpath.select('//best_book/id', doc);
        const titlesArr = xpath.select('//title', doc);
        const namesArr = xpath.select('//name', doc);
        for (let i = 0; i < idsArr.length; i++) {
          const bookObject = {
            id: idsArr[i].firstChild.data,
            title: titlesArr[i].firstChild.data,
            author: namesArr[i].firstChild.data
          };
          booksArray.push(bookObject);
        }
        const urls = idsArr.map((id, index) => {
          return `https://www.goodreads.com/book/show/${id.firstChild.data}`;
        });
        return urls;
      })
      .then(async urls => {
        const promiseArray = urls.map(url => axios.get(url));
        const results = await Promise.all(promiseArray);
        const imgUrls = results.map((result, index) => {
          const $ = cheerio.load(result.data);
          const imgUrl = $('#coverImage').attr('src');
          return imgUrl;
        });
        for (let i = 0; i < imgUrls.length; i++) {
          booksArray[i].image = imgUrls[i];
        }
        return res.json(booksArray);
      });
  }
};

实际:

POST/api/goodreads/search--ms--
节点:63327 UnhandledPromisejectionWarning:错误:请求失败,状态代码403

我想这与他们的API的限制相同:

不要每秒多次请求任何方法。古德雷兹追踪所有人 开发人员提出的请求

您可能希望使用而不是刮取数据


这可能也会有所帮助。

感谢您提供的建议链接。我采纳了你的建议,将请求频率降低到不超过每秒1次,实际上是每5秒1次,这取得了一些成功。然而,在19次成功查询之后,再次触发来自goodreads的临时403禁止响应。我会采纳你的建议,寻找一个更合适的API。goodreads API提供的封面图片质量不高,有时甚至完全缺失。
[
  {
    "id": "49628",
    "title": "Cloud Atlas",
    "author": "David Mitchell",
    "image": "https://images.gr-assets.com/books/1406383769l/49628.jpg"
  },
  {
    "id": "6795",
    "title": "The Cloud Atlas",
    "author": "Liam Callanan",
    "image": "https://images.gr-assets.com/books/1388200445l/6795.jpg"
  },
  {
    "id": "6797",
    "title": "Cloud Atlas",
    "author": "Donald Platt",
    "image": "https://images.gr-assets.com/books/1165604675l/6797.jpg"
  },
  {
    "id": "9113096",
    "title": "Cloud Atlas (Novel)",
    "author": "Frederic P.  Miller",
    "image": "https://images.gr-assets.com/books/1348108862l/9113096.jpg"
  },
  {
    "id": "42964582",
    "title": "Cloud Atlas",
    "author": "Lana Wachowski, Lilly Wachowski",
    "image": "https://images.gr-assets.com/books/1543315136l/42964582.jpg"
  },
  {
    "id": "36514377",
    "title": "Cloud Atlas",
    "author": "Aileen Brennigan",
    "image": "https://images.gr-assets.com/books/1509527373l/36514377.jpg"
  },
  {
    "id": "6270907",
    "title": "International Cloud Atlas, Vol. 2",
    "author": "G.O.P. Obasi",
    "image": "https://images.gr-assets.com/books/1408629611l/6270907.jpg"
  },
  {
    "id": "17981504",
    "title": "Cloud Atlas (Web Toon/Manwa)",
    "author": "SIU",
    "image": "https://images.gr-assets.com/books/1369643486l/17981504.jpg"
  },
  {
    "id": "20336925",
    "title": "Cloud Atlas: A BookCaps Study Guide",
    "author": "BookCaps",
    "image": "https://images.gr-assets.com/books/1388272670l/20336925.jpg"
  },
  {
    "id": "16453357",
    "title": "Cloud Atlas (Novel)",
    "author": "Jesse Russell",
    "image": "https://images.gr-assets.com/books/1356198781l/16453357.jpg"
  },
  {
    "id": "24416536",
    "title": "Anarchici - Matrix, Cloud Atlas",
    "author": "Flavia Monceri",
    "image": "https://images.gr-assets.com/books/1420884508l/24416536.jpg"
  },
  {
    "id": "254004",
    "title": "The Cloud Atlas Of China",
    "author": "National Meteorological Service of China"
  },
  {
    "id": "36368685",
    "title": "Weather: An Illustrated History: From Cloud Atlases to Climate Change",
    "author": "Andrew Revkin",
    "image": "https://images.gr-assets.com/books/1530073964l/36368685.jpg"
  },
  {
    "id": "28288713",
    "title": "Histopias: From the Bible to Cloud Atlas",
    "author": "Dragos Moraru"
  },
  {
    "id": "1522255",
    "title": "International Cloud Atlas: Volume I--Manual on the Observations of Clouds and Other Meteors",
    "author": "World Meteorological Organization"
  },
  {
    "id": "18949690",
    "title": "Atlas Cloud and the Witch of the West (Atlas Cloud Saga)",
    "author": "L.M.J. Rayner",
    "image": "https://images.gr-assets.com/books/1385411062l/18949690.jpg"
  },
  {
    "id": "21898982",
    "title": "Cloud Atlas by David Mitchell l Summary & Study Guide",
    "author": "BookRags",
    "image": "https://images.gr-assets.com/books/1397316499l/21898982.jpg"
  },
  {
    "id": "3338750",
    "title": "Cloud Atlas I, II, III: For Piano",
    "author": "Toshi Ichiyanagi"
  },
  {
    "id": "34169332",
    "title": "Postmodernism and Time in David Mitchell's Ghostwritten and Cloud Atlas",
    "author": "Hoo-Ting Miranda Li"
  },
  {
    "id": "19675573",
    "title": "Postmodernist Intertextuality in David Mitchell's Cloud Atlas",
    "author": "Martina Hrubes",
    "image": "https://images.gr-assets.com/books/1387742749l/19675573.jpg"
  }
]