Python 用硒刮掉整根柱子_Python_Selenium_Beautifulsoup_Webdriver_Selenium Firefoxdriver

Python 用硒刮掉整根柱子

python selenium

Python 用硒刮掉整根柱子,python,selenium,beautifulsoup,webdriver,selenium-firefoxdriver,Python,Selenium,Beautifulsoup,Webdriver,Selenium Firefoxdriver,我正在尝试让python刮刀代码工作，但我做不到，一点帮助会有用，我还是一个初学者。代码运行正常，但它崩溃并将单个作业导出到我的csv，我认为这是随机的，不会给出任何错误。请有更多经验的人提供一些提示。提前感谢从selenium导入webdriver 作为pd进口熊猫从bs4导入BeautifulSoup options=webdriver.FirefoxOptions（） driver=webdriver.Firefox（）驱动程序。最大化_窗口（） df=pd.DataFrame（列

我正在尝试让python刮刀代码工作，但我做不到，一点帮助会有用，我还是一个初学者。代码运行正常，但它崩溃并将单个作业导出到我的csv，我认为这是随机的，不会给出任何错误。请有更多经验的人提供一些提示。提前感谢

从selenium导入webdriver
作为pd进口熊猫
从bs4导入BeautifulSoup
options=webdriver.FirefoxOptions（）
driver=webdriver.Firefox（）
驱动程序。最大化_窗口（）
df=pd.DataFrame（列=[“标题”、“位置”、“公司”、“工资”、“赞助”、“说明”]）
对于范围（25）内的i：
司机，上车https://www.indeed.co.in/jobs?q=artificial%20intelligence&l=India&start=“+str（i））
工作=[]
驱动程序。隐式等待（20）
用于驱动程序中的作业。按类名称（“结果”）查找元素：
soup=BeautifulSoup（job.get_属性（'innerHTML'），'html.parser'）
尝试：
title=soup.find（“a”，class=“jobtitle”）.text.replace（“\n”，“”）.strip（）
除：
标题='无'
尝试：
location=soup.find（class=“location”）.text
除：
位置='无'
尝试：
company=soup.find（class=“company”）.text.replace（“\n”，”）.strip（）
除：
公司=‘无’
尝试：
salary=soup.find（class=“salary”）.text.replace（“\n”，”）.strip（）
除：
工资=‘无’
尝试：
赞助商=soup.find（class=“sponsoredGray”）.text
赞助商=“赞助商”
除：
赞助商=“有机”
sum\u div=job.通过类名称（“summary”）查找元素
尝试：
求和div.单击（）
除：
close\u button=驱动程序。通过类名称（“popover-x-button-close”）查找元素[0]
关闭按钮。单击（）
求和div.单击（）
驱动程序。隐式等待（2）
尝试：
job_desc=driver.find_element_by_css_selector（'div#vjs desc'）。text
打印（作业描述）
除：
作业描述='无'
追加（{'Title'：Title，'Location'：Location，“Company”：公司，“Salary”：薪水，
“赞助商”：赞助商，“描述”：工作描述，忽略索引=真）
df.to_csv（r“C:\Users\Desktop\Python\Newtest.csv”，index=False）

这似乎是一个简单的缩进问题。代码的一部分正在for循环之外运行

from selenium import webdriver
import pandas as pd 
from bs4 import BeautifulSoup

from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

options = Options()    
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)


df = pd.DataFrame(columns=["Title","Location","Company","Salary","Sponsored","Description"])

for i in range(0,50,10):
    driver.get('https://www.indeed.co.in/jobs?q=artificial%20intelligence&l=India&start='+str(i))
    jobs = []
    driver.implicitly_wait(20)
    

    for job in driver.find_elements_by_class_name('result'):

        soup = BeautifulSoup(job.get_attribute('innerHTML'),'html.parser')
        
        try:
            title = soup.find("a",class_="jobtitle").text.replace("\n","").strip()
            
        except:
            title = 'None'

        try:
            location = soup.find(class_="location").text
        except:
            location = 'None'

        try:
            company = soup.find(class_="company").text.replace("\n","").strip()
        except:
            company = 'None'

        try:
            salary = soup.find(class_="salary").text.replace("\n","").strip()
        except:
            salary = 'None'

        try:
            sponsored = soup.find(class_="sponsoredGray").text
            sponsored = "Sponsored"
        except:
            sponsored = "Organic"


        sum_div = job.find_element_by_class_name('summary')

        try:    
                    sum_div.click()
        except:
                    close_button = driver.find_elements_by_class_name('popover-x-button-close')[0]
                    close_button.click()
                    sum_div.click()            
        driver.implicitly_wait(2)
        try:            
            job_desc = driver.find_element_by_css_selector('div#vjs-desc').text
            print(job_desc)
        except:
            job_desc = 'None'   

        df = df.append({'Title':title,'Location':location,"Company":company,"Salary":salary,
                                "Sponsored":sponsored,"Description":job_desc},ignore_index=True)

df.to_csv("test.csv",index=False)

我使用Chrome而不是Firefox，但我认为问题不在这里。我只是正确地缩进了你的代码

此外，除了没有例外的错误之外，放置“除非”也不是一个好主意。

这似乎是一个缩进问题。我回答中的代码给了我1931行的CSV文件。谢谢你的帮助，我在Chrome上试过你的代码，效果很好，但在Firefox上问题依然存在。现在，请在try行输入“TabError:缩进中制表符和空格的使用不一致”：try sum_div.click（）。我一直在更改空格，但没有成功。这个错误意味着您在某些位置使用了4个空格，在其他位置使用了1个制表符。如果您查看代码并将所有4个空间更改为选项卡，它将解决错误。@ DariusFlorea是解决了您的问题或回答了您的同意，请考虑将答案标记为已接受。我终于解决了。谢谢@Christopher Holder