Python-Crawler从表中提取数据_Python_Beautifulsoup_Web Crawler_Html Table

Python-Crawler从表中提取数据

python web-crawler

Python-Crawler从表中提取数据,python,beautifulsoup,web-crawler,html-table,Python,Beautifulsoup,Web Crawler,Html Table,我试图从Agoda（）中提取客户评论部分。我感兴趣的数据位于“div id=“hotelreview panel”下，其中包括不同类型旅行者（如商务旅行者）进行的审核次数以及每种类型旅行者的相应KPI等级（如性价比）我有两个问题：（1）我无法通过BeautifulSoup的查找功能找到正确的表。存在表类“customer review category issues”，但它始终不返回任何值 import requests import math import csv from bs4 im

我试图从Agoda（）中提取客户评论部分。我感兴趣的数据位于“div id=“hotelreview panel”下，其中包括不同类型旅行者（如商务旅行者）进行的审核次数以及每种类型旅行者的相应KPI等级（如性价比）

我有两个问题：

（1）我无法通过BeautifulSoup的查找功能找到正确的表。存在表类“customer review category issues”，但它始终不返回任何值

import requests
import math
import csv
from bs4 import BeautifulSoup

HotelNames = ['grand-hyatt-taipei']

with open('agoda_hotel_reviews.csv', 'w') as csvfile:
    for iHotel in HotelNames: 
        url = "http://www.agoda.com/"+iHotel+"/hotel/taipei-tw.html"
        res = requests.get(url)
        soup = BeautifulSoup(res.text, 'html.parser')

        table_review = soup.find("table", {"class" : "customer-review-category-issues"})
        record_rev = []

        for row in table_review.findAll('tr'):
            col = row.findAll('td')
            rev_issue = col[1].string.split('\n').strip()[0]
            rev_count = col[1].string.split('\n').strip()[1]
            record_rev.extend([rev_issue], [rev_count])

     filewriter = csv.writer(csvfile, delimiter='|', lineterminator='\n')
     filewriter.writerow(record_rev)

（2）当我切换到不同的旅行者类型时，如何提取KPI，使我的返回列表类似于[所有审查，35，8.1，9.2，9.0，9.1，9.1，9.1，8.3，商务旅行者，10，7.8，8.6，8.4，8.6，8.6，7.2]，即[旅行者类型，审查，KPI 1（物有所值），KPI 2（地点），…KPI 6]问题是：页面的评论和其他部分动态加载到服务API的附加XHR请求中。如果打开开发人员工具并仅过滤XHR请求，您将看到：

如果您计划继续使用

请求

美化组

，您可能会对模拟对“GetReviewCore”和“GetReviewComments”端点的请求感兴趣

或者，您也可以采取更“高级”的方法，使用自动化真正的浏览器。

这是一个巨大的帮助！非常感谢。