Python tree.xpath返回空列表_Python_Xpath_Lxml

Python tree.xpath返回空列表

python xpath

Python tree.xpath返回空列表,python,xpath,lxml,Python,Xpath,Lxml,我试图写一个程序，可以刮一个给定的网站。到目前为止，我有： from lxml import html import requests page = requests.get('https://www.cruiseplum.com/search#{"numPax":2,"geo":"US","portsMatchAll":true,"numOptionsShown":20,"ppdIncludesTaxTips":true,"uiVersion":"split","sortTableByFie

我试图写一个程序，可以刮一个给定的网站。到目前为止，我有：

from lxml import html
import requests

page = requests.get('https://www.cruiseplum.com/search#{"numPax":2,"geo":"US","portsMatchAll":true,"numOptionsShown":20,"ppdIncludesTaxTips":true,"uiVersion":"split","sortTableByField":"dd","sortTableOrderDesc":false,"filter":null}')

tree = html.fromstring(page.content)

date = tree.xpath('//*[@id="listingsTableSplit"]/tr[2]/td[1]/text()')

ship = tree.xpath('//*[@id="listingsTableSplit"]/tr[2]/td[2]/text()')

length = tree.xpath('//*[@id="listingsTableSplit"]/tr[2]/td[4]/text()')

meta = tree.xpath('//*[@id="listingsTableSplit"]/tr[2]/td[6]/text()')

price = tree.xpath('//*[@id="listingsTableSplit"]/tr[2]/td[7]/text()')

print('Date: ', date)
print('Ship: ', ship)
print('Length: ', length)
print('Meta: ', meta)
print('Price: ', price)

运行此操作时，列表返回空

我对python和一般的编码非常陌生，非常感谢你们能提供的任何帮助

谢谢

问题似乎是您导航到的URL。在浏览器中导航到该URL会导致提示，询问您是否要恢复书签搜索

我没有看到一个简单的解决方法。单击“是”将导致javascript操作，而不是使用不同参数的实际重定向

我建议您使用selenium之类的工具来实现这一点。

首先，您使用的链接不正确；这是正确的链接（单击按钮“是”（网站将下载数据并返回到此链接）后）：

其次，当您使用请求获取响应对象时，表中隐藏的内容数据不会返回：

from lxml import html
import requests

u = 'https://www.cruiseplum.com/search#{%22numPax%22:2,%22geo%22:%22US%22,%22portsMatchAll%22:true,%22numOptionsShown%22:20,%22ppdIncludesTaxTips%22:true,%22uiVersion%22:%22split%22,%22sortTableByField%22:%22dd%22,%22sortTableOrderDesc%22:false,%22filter%22:null}'
r = requests.get(u)
t = html.fromstring(r.content)

for i in t.xpath('//tr//text()'):
    print(i)

这将返回：

Recent update: new computer-optimized interface and new filters
Want to track your favorite cruises?
Login or sign up to get started.
Login / Sign Up
Loading...
Email status
Unverified
My favorites & alerts
Log out
Want to track your favorite cruises?
Login or sign up to get started.
Login / Sign Up
Loading...
Email status
Unverified
My favorites & alerts
Log out
Date Colors:
(vs. selected)
Lowest Price
Lower Price
Same Price
Higher Price

即使使用html，内容仍然是隐藏的

from requests_html import HTMLSession
session = HTMLSession()
r = session.get(u)

您需要使用selenium访问隐藏的html内容：

from lxml import html
from selenium import webdriver
import time

u = 'https://www.cruiseplum.com/search#{%22numPax%22:2,%22geo%22:%22US%22,%22portsMatchAll%22:true,%22numOptionsShown%22:20,%22ppdIncludesTaxTips%22:true,%22uiVersion%22:%22split%22,%22sortTableByField%22:%22dd%22,%22sortTableOrderDesc%22:false,%22filter%22:null}'
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)

time.sleep(2)

driver.find_element_by_id('restoreSettingsYesEncl').click()
time.sleep(10) #wait until the website downoad data, without this we can't move on

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")

t = html.fromstring(source_code)

for i in t.xpath('//td[@class="dc-table-column _1"]/text()'):
    print(i.strip())

driver.quit()

返回第一列（容器名称）：

如您所见，现在使用selenium的get\u属性（“innerHTML”）访问表中的内容

下一步是刮取行（船只、路线、日期、地区…）并将其存储在csv文件（或任何其他格式）中，

然后对所有4051页执行此操作。

您是否设法缩小了问题的范围？如果您对python和一般编码都是新手，那么我建议您在开始时避免web抓取。首先，您可以学习编程的基础知识，然后尝试使用web抓取库。您熟悉网页、DOM、xpath、css、javascript吗？感谢您的回复，但这没有帮助：）

from lxml import html
from selenium import webdriver
import time

u = 'https://www.cruiseplum.com/search#{%22numPax%22:2,%22geo%22:%22US%22,%22portsMatchAll%22:true,%22numOptionsShown%22:20,%22ppdIncludesTaxTips%22:true,%22uiVersion%22:%22split%22,%22sortTableByField%22:%22dd%22,%22sortTableOrderDesc%22:false,%22filter%22:null}'
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)

time.sleep(2)

driver.find_element_by_id('restoreSettingsYesEncl').click()
time.sleep(10) #wait until the website downoad data, without this we can't move on

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")

t = html.fromstring(source_code)

for i in t.xpath('//td[@class="dc-table-column _1"]/text()'):
    print(i.strip())

driver.quit()

Costa Luminosa
Navigator Of The Seas
Navigator Of The Seas
Carnival Ecstasy
Carnival Ecstasy
Carnival Ecstasy
Carnival Victory
Carnival Victory
Carnival Victory
Costa Favolosa
Costa Favolosa
Costa Favolosa
Costa Smeralda
Carnival Inspiration
Carnival Inspiration
Carnival Inspiration
Costa Smeralda
Costa Smeralda
Disney Dream
Disney Dream