Python 用硒和蟒蛇刮_Python_Selenium_Selenium Webdriver_Web Scraping

Python 用硒和蟒蛇刮

python selenium selenium-webdriver web-scraping

Python 用硒和蟒蛇刮,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我试图深入硒的世界，但我在理解事物如何运作方面遇到了问题首先，我只是想学习如何抓取网站以这个网站为例我想能够刮所有可用的优惠券和回报：标题，日期，网址链接现在我可以在BeautifulSoup中使用 search_coupon = soup.find_all('div',{'class':'td_module_1 td_module_wrap td-animation-stack'}) for coupon in search_coupon: coupon_title = c

我试图深入硒的世界，但我在理解事物如何运作方面遇到了问题

首先，我只是想学习如何抓取网站

以这个网站为例

我想能够刮所有可用的优惠券和回报：标题，日期，网址链接

现在我可以在BeautifulSoup中使用

search_coupon = soup.find_all('div',{'class':'td_module_1 td_module_wrap td-animation-stack'})

for coupon in search_coupon:
    coupon_title = coupon.find('h3',{'class':'entry-title td-module-title'}).text
    coupon_date = coupon.find('span',{'class':'td-post-date'}).text
    coupon_url = coupon.find('a').get('href')
    print(coupon_title, coupon_date, coupon_url)

如何使用硒

我似乎无法用同样的方法检索我想要的对象

救命！！：）

您可以从以下内容开始：

# Definning some basic functions for later usage
def clickOnId(id):
    browser.find_element_by_id(id).click()

def clickOnXpath(xpath):
    browser.find_element_by_xpath(xpath).click()

def clickOnClass(class_name):
    browser.find_element_by_class_name(class_name).click()

def TypeInId(id,toBeTyped):
    elems = browser.find_elements_by_id(id)
    elems[0].send_keys(toBeTyped)

def TypeInXpath(xpath,toBeTyped):
    elems = browser.find_elements_by_xpath(xpath)
    elems[0].send_keys(toBeTyped)

还可以查看进入selenium的代码。

要使用selenium实现浏览器自动化

首先，您需要为firefox下载或，并将其保存到某个位置

其次，创建一个变量来保存浏览器webdriver路径，比如，

driver=webdriver.Chrome（executable\u path=r'C:/path/to/chromedriver.exe'）

第三，定义空列表来保存从web上抓取的每个属性的数据。类似于此，

优惠券标题=[]。\35;存储优惠券标题的列表

。原因是，如果您有兴趣稍后将此数据保存到数据框中以供进一步分析，请参阅

下面给出了一个最小的可复制示例：

from bs4 import BeautifulSoup
from selenium import webdriver

coupon_title=[] #List to store coupon title
coupon_date=[] #List to store coupon date
coupon_url=[] #List to store coupon url

driver = webdriver.Chrome(executable_path = r'C:/temp/chromedriver.exe')
driver.get("https://udemycoupons.me/")
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
soup.prettify()
search_coupon = soup.find_all('div',{'class':'td_module_1 td_module_wrap td-animation-stack'})

for coupon in search_coupon:
    coupon_title = coupon.find('h3',{'class':'entry-title td-module-title'}).text
    coupon_date = coupon.find('span',{'class':'td-post-date'}).text
    coupon_url = coupon.find('a').get('href')
    print(coupon_title, coupon_date, coupon_url)

结果如下：

希望这有帮助。

注意：本网站是fakeSo，您可以将美丽的汤和硒混合在一起。假设我想登录Udemy并将免费课程添加到我的帐户，我是否需要每次访问页面时都登录？我问的原因是selenium每次在您run@MartynBell美丽和硒是两个截然不同的实体。如果您想分类阅读它的文档，Beauty Soup是一个Python库，用于从HTML和XML文件中提取数据。它在打开/访问网页时不起任何作用。正是因为这个原因，硒才存在。我希望这能回答你的疑问。希望这是最后一个问题。在我试图收集的项目的底部有一个“下一页”按钮。我想类名是“td icon menu right”，但当我把

下一页=driver.按class\u name（“td icon menu right”）查找元素时。单击（）

我得到一条错误消息

元素不可交互

我哪里出错了？@MartynBell这是你问的一个好问题。我对这个问题的看法是，它可以有很多可能的解决方案。从初学者的角度来看，我可以考虑单击

next page

按钮。但如果我遇到像你这样的困难，我会想其他办法。在下一个按钮之前，它有编号的按钮，如

，

等。单独单击它们，我会看到一个常见的url，如

https://udemycoupons.me/page/2/

，

https://udemycoupons.me/page/3/

。然后我还看到最大页数为

第1页，共179页

。想想你会怎么解决这个问题？如果你不明白，问一个新问题，我会回答。