Python 无法从包含BeautifulSoup的页面获取实际标记_Python_Selenium_Python 3.x_Web Scraping_Beautifulsoup

Python 无法从包含BeautifulSoup的页面获取实际标记

python selenium python-3.x web-scraping

Python 无法从包含BeautifulSoup的页面获取实际标记,python,selenium,python-3.x,web-scraping,beautifulsoup,Python,Selenium,Python 3.x,Web Scraping,Beautifulsoup,我正试图用BeautifulSoup和Selinium http://starwood.ugc.bazaarvoice.com/3523si-en_us/115/reviews.djs?format=embeddedhtml&page=2&scrollToTop=true 我试过这个密码 active_review_page_html = browser.page_source active_review_page_html = active_review_page_html

我正试图用

BeautifulSoup

和

Selinium

http://starwood.ugc.bazaarvoice.com/3523si-en_us/115/reviews.djs?format=embeddedhtml&page=2&scrollToTop=true

我试过这个密码

active_review_page_html  = browser.page_source
active_review_page_html = active_review_page_html.replace('\\', "")
hotel_page_soup = BeautifulSoup(active_review_page_html)
print(hotel_page_soup)

但它返回给我的数据是什么样的呢

;&lt;span class="BVRRReviewText"&gt;Hotel accommodations and staff were fine ....

但我必须用你的手从那页上刮下那一段

for review_div in hotel_page_soup.select("span .BVRRReviewText"):

如何从该URL获取真正的标记？

首先，您给了我们一个错误的链接，而不是您试图刮取的链接，您给了我们一个指向参与页面加载js文件的链接，这将是一个不必要的解析挑战

其次，您不需要

BeautifulSoup

在这种情况下，

selenium

本身擅长定位元素并提取文本或属性。这里不需要额外的步骤

下面是一个使用实际页面的工作示例，其中包含您希望获得的评论：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()  # or webdriver.Firefox()
driver.get('http://www.starwoodhotels.com/sheraton/property/reviews/index.html?propertyID=115&language=en_US')

# wait for the reviews to load
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "span.BVRRReviewText")))

# get reviews
for review_div in driver.find_elements_by_css_selector("span.BVRRReviewText"):
    print(review_div.text)
    print("---")

driver.close()

印刷品：

This is not a low budget hotel . Yet the hotel offers no amenities. Nothing and no WiFi. In fact, you block the wifi that comes with my celluar plan. I am a part of 2 groups that are loyal to the Sheraton, Alabama A&M and the 9th Episcopal District AMEChurch but the Sheraton is not loyal to us.
---
We are a company that had (5) guest rooms at the hotel. Despite having a credit card on file for room and tax charges, my guest was charged the entire amount to her personal credit card. It has taken me (5) PHONE CALLS and my own time and energy to get this bill reversed. I guess leaving a message with information and a phone number numerous times is IGNORED at this hotel. You can guarantee that we will not return with our business. YOu may thank Kimerlin or Kimberly in your accounting office for her lack of personal service and follow through for the lost business in the future.
---
...

我故意让您处理分页-如果您有困难，请告诉我

活动\u查看\u页面\u html这是您的标记！但这也像是

；span class=“bvrreviewText”

如果您访问该URL，它与我从SeliniumOr获得的输出不一样，或者我是否应该将

替换为

此活动的\u review\u页面\u html来自何处？亲爱的，我如何在查看页面中导航，因为您可以看到，如果您单击第2页，其URL不会更改。。。而如果你在第二页看到JS链接。。这就是我要爬的地方。。。我目前正在通过抓取分页的a
shref
值来浏览页面div@Mani这就是我留给你们去实现的。请另外提出一个问题，说明你目前有哪些困难。谢谢。嘿，我可以用Selinum刮页面，但我不想用它。。。请看我的问题