Python 从tripadvisor中搜索酒店。如何从所有页面（如1到10页）获取酒店并存储？_Python_Csv_Web Scraping_Beautifulsoup_Python 3.5

Python 从tripadvisor中搜索酒店。如何从所有页面（如1到10页）获取酒店并存储？

python csv web-scraping

Python 从tripadvisor中搜索酒店。如何从所有页面（如1到10页）获取酒店并存储？,python,csv,web-scraping,beautifulsoup,python-3.5,Python,Csv,Web Scraping,Beautifulsoup,Python 3.5,我的代码显示了酒店的第一页。为什么它不显示更多 import csv import requests from bs4 import BeautifulSoup hotels=[] i=0 url0 = 'https://www.tripadvisor.com/Hotels-g295424-Dubai_Emirate_of_Dubai- Hotels.html#EATERY_LIST_CONTENTS' r = requests.get(url0) data = r.text

我的代码显示了酒店的第一页。为什么它不显示更多

import csv

import requests

from bs4 import BeautifulSoup

hotels=[]
i=0

url0 = 'https://www.tripadvisor.com/Hotels-g295424-Dubai_Emirate_of_Dubai-     Hotels.html#EATERY_LIST_CONTENTS'

r = requests.get(url0)

data = r.text
soup = BeautifulSoup(r.text, "html.parser")with open('hotels_Data.csv','wb') as file:

for link in soup.findAll('a', {'property_title'}):
    print('https://www.tripadvisor.com/Hotels-g295424-' + link.get('href'))
    print(link.string)


for i in range(20):
   while int(i) <= (20):
    i = str(i)

    url1 = 'https://www.tripadvisor.com/Hotels-g295424-oa' + i + '-  Dubai_Emirate_of_Dubai-Hotels.html#EATERY_LIST_CONTENTS'
    r1 = requests.get(url1)
    data1 = r1.text
    soup1 = BeautifulSoup(data1, "html.parser")
    for link in soup1.findAll('a', {'property_title','price'}):
        print('https://www.tripadvisor.com/Hotels-g294212-' +    link.get('href'))
        print(link.string)
        for link in soup.select("a.reference.internal"):
            url1 = link["href"]
            absolute_url = urljoin(base_url, url1)

            print(url1, absolute_url)       
        writer = csv.writer(file)
        for row in hotels:
            writer.writerow([s.encode("utf-8") for s in row])                                                
break

导入csv
导入请求
从bs4导入BeautifulSoup
酒店=[]
i=0
url0='1〕https://www.tripadvisor.com/Hotels-g295424-Dubai_Emirate_of_Dubai-     Hotels.html#EATERY_LIST_CONTENTS'
r=请求。获取（url0）
数据=r.text
soup=BeautifulSoup（r.text，“html.parser”），打开（'hotels_Data.csv'，'wb'）作为文件：
对于soup.findAll（'a'，{'property_title'}）中的链接：
打印（'https://www.tripadvisor.com/Hotels-g295424-'+link.get（'href'））
打印（link.string）
对于范围（20）内的i：
虽然int（i）检查页面底部下一页的链接-此门户不使用页码-1
、2
、3
等，但提供偏移量-0
、30
、60
、90
等。（因为它在页面上显示30个优惠）
因此，您必须在url中使用值0
，30
，60
，90
，等等
"...-oa" + offset + "-Dubai_Emirate..."

您可以使用ie.范围（0，250，30）
来获取值0
，30
，60
，90

import requests
from bs4 import BeautifulSoup

for offset in range(0, 250, 30):
    print('--- page offset:', offset, '---')

    url = 'https://www.tripadvisor.com/Hotels-g295424-oa' + str(offset) + '-Dubai_Emirate_of_Dubai-Hotels.html#EATERY_LIST_CONTENTS'

    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    for link in soup.find_all('a', {'property_title'}):
          print(link.text)

但提供的服务可能超过250个，因此您必须获得指向最后一页的链接才能获得正确的值，而不是250

import requests
from bs4 import BeautifulSoup

offset = 0
url = 'https://www.tripadvisor.com/Hotels-g295424-oa' + str(offset) + '-Dubai_Emirate_of_Dubai-Hotels.html#EATERY_LIST_CONTENTS'

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

for link in soup.find_all('a', {'last'}):
    page_number = link.get('data-page-number')
    last_offset = int(page_number) * 30
    print('last offset:', last_offset)

并在范围内使用last\u offset+1
（0，last\u offset+1,30）


编辑：餐馆使用JavaScript和AJAX加载数据
import requests
from bs4 import BeautifulSoup

size = 30

# direct url - doesn't have expected information
#url = 'https://www.tripadvisor.com/Restaurants-g187791-Rome_Lazio.html'

# url used by AJAX
url = 'https://www.tripadvisor.com/RestaurantSearch?Action=PAGE&geo=187791&ajax=1&itags=10591&sortOrder=relevance&o=a' + str(size) + '&availSearchEnabled=true&eaterydate=2017_04_27&date=2017-04-28&time=20%3A00%3A00&people=2'

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

link = soup.find_all('a')[-1]
page_number = link.get('data-page-number')
last_offset = int(page_number) * size # *30
print('last offset:', last_offset)

offset = link.get('data-offset')
offset = int(offset) + size # +30
print('offset:', offset)

当你问TripAdvisor是否允许你这样做时，他们没有通过API为你提供访问权限吗？没有，他们只为有业务的人提供API（供官方使用）。。。。。我是一名学生，我只是需要我的项目的一些数据。你可以尝试使用类似Selenium的东西来查找页面上的“下一页”按钮。需要比BS稍长一点，因为它实际上打开了一个浏览器窗口与之交互，但可以快速解决问题门户使用值30、60、90、120等，而不是1、2、3，作为下一个页码-因为第页上有30个报价。@furas你能告诉我怎么做吗？我需要你的帮助，你能告诉我如何从tripadvisor获得餐厅的补偿吗？因为我在酒店使用了你的上述方法，所以效果很好，但在餐馆里就不起作用了。请帮帮我。@Hifzaahmad“不工作”是什么意思？我并没有检查页面，但餐馆可以使用不同的标签或不同的分页，甚至可以使用JavaScript。这并不奇怪。